In vivo incorporation of an unnatural amino acid comprising a 1,2-aminothiol group

ABSTRACT

The invention relates to orthogonal pairs of tRNAs and aminoacyl-tRNA synthetases that can incorporate unnatural amino acids that comprise a 1,2 aminothiol group into polypeptides. The invention provides translation systems in which polypeptides comprising unnatural amino acids that comprise a 1,2 aminothiol group can be produced. The invention also provides methods for producing polypeptides containing unnatural amino acids that comprise a 1,2 aminothiol group. Also provided by the invention are compositions comprising orthogonal aminoacyl-tRNA synthetases that preferentially aminoacylate a cognate orthogonal tRNA with unnatural amino acids that comprise a 1,2 aminothiol group. The invention provides methods for the synthesis of the unnatural amino acid 2-amino-3-(4-(2-amino-3-mercaptopropan-amido)phenyl)-propanoic acid.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to and benefit of U.S. ProvisionalPatent Application Ser. No. 61/067,524, entitled, “IN VIVO INCORPORATIONOF AN UNNATURAL AMINO ACID COMPRISING A 1,2-AMINOTHIOL GROUP,” by SimonFicht, et al., filed Feb. 27, 2008, the contents of which areincorporated herein by reference in their entirety for all purposes.

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSOREDRESEARCH AND DEVELOPMENT

The invention was made with United States Government support under GrantDE-FG02-03ER46051 from the Department of Energy. The United StatesGovernment has certain rights in the invention.

FIELD OF THE INVENTION

The invention is in the field of translation biochemistry. The inventionrelates to compositions and methods for making and using orthogonaltRNAs, orthogonal aminoacyl-tRNA synthetases, and O-RS/O-tRNA pairs thatincorporate unnatural amino acids that comprise a 1,2 aminothiol groupinto proteins. The invention also relates to methods of producingproteins comprising such unnatural amino acids in cells using suchorthogonal pairs. In addition, the invention relates to proteins made bythese methods.

BACKGROUND OF THE INVENTION

Native chemical ligation (NCL) is a non-enzymatic, highly chemoselectivereaction that proceeds efficiently in aqueous conditions atphysiological pH. In the classical NCL reaction, a peptide comprising anN-terminal cysteine reacts with a moiety, e.g., another peptide,comprising an α-thioester group, e.g. a C-terminal thioester, in thepresence of an exogenous thiol catalyst to yield a native peptide bondat the site of ligation (Dawson, et al. (1994) “Synthesis of Proteins byNative Chemical Ligation.” Science 266: 776-779). NCL can be used inprotein semisynthesis (Schwarzer, et al. (2005) “Protein semisynthesisand expressed protein ligation: chasing a protein's tail.” Curr Op ChemBiol 9: 561-569) to generate cyclic peptides (Camarero, et al. (2001)“Peptide Chemical Ligation Inside Living Cells: In Vivo Generation of aCircular Protein Domain.” Bio Med Chem Lett 9: 2479-2484), to generateprotein-liposome conjugates (Reulen, et al. (2007) “Protein-LiposomeConjugates Using Cysteine-Lipids And Native Chemical Ligation.”Bioconjugate Chem 18: 590-596), and to conjugate proteins to smallmolecule probes in vivo (Yeo, et al. (2003) “Cell-permeablesmall-molecule probes or site-specific labeling of proteins. Chem Commun23: 2870-2871). NCL reactions have a variety of applications inbiotechnology, biomedical research, and chemical biology, includingpeptide synthesis (Low, et al. (2001) “Total synthesis of cytochromeb562 by native chemical ligation using a removable auxiliary” Proc NatlAcad Sci USA 98: 6554-6559), chemical targeting of biomolecules in vivo(Yeo, et al. (2003) “Cell-permeable small-molecule probes orsite-specific labeling of proteins. Chem Commun 23: 2870-2871), andimmobilization of proteins on surfaces and resins (Girish, et al. (2005)“Site-specific immobilization of proteins in a microarray usingintein-mediated protein splicing.” Bio Med Chem Lett 15: 2447-2451).

The requirement for an N-terminal cysteine residue is an intrinsiclimitation of the NCL reaction. The native chemical ligation of peptidescomprising N-terminal amino acids other than cysteine has been reported(WO98/28434; Canne, et al. (1996) “Extending the Applicability of NativeChemical Ligation.” J Am Chem Soc 118: 5891-5896), wherein the ligationis performed using a moiety comprising an α-thioester and a peptide orpolypeptide segment having an N-terminal N-{thiol-substituted auxiliary}group represented by the formula HS—CH2—CH2—O—NH-[peptide]. Followingligation, the N-{thiol substituted auxiliary} group is removed bycleaving the HS—CH2—CH2—O-auxiliary group to generate a native peptidebond at the ligation site. However, this approach is suitable if apractitioner desires that the ligation product contain a glycine residueat the site of bond formation (Canne, et al. (1996) “Extending theApplicability of Native Chemical Ligation.” J Am Chem Soc 118:5891-5896).

Alternately, removable N^(α)-(1-phenyl-2-mercaptoethyl)auxiliaries canbe used to enable NCL reactions in polypeptides that do not containcysteine residues (Botti, et al. (2001) “Native chemical ligation usingremovable N^(α)(1-phenyl-2-mercaptoethyl)auxiliaries.” Tetrahedron Lett42: 1831-1833). In this approach, the(1-phenyl-2-mercaptoethyl)auxiliary on the α-amino group of apolypeptide of interest acts as a 1,2 aminothiol-containing functionalgroup to effect thioester-mediated peptide bond-forming ligation with asecond, α-thioester-containing moiety. Subsequent removal of theauxiliary from the newly formed peptide bond generates a ligationproduct that comprises a native peptide structure. Though this methodenables ligation at a variety of amino acids and greatly increases thenumber of proteins accessible NCL, the formation of the peptide bond isnevertheless limited to the N-terminus of the peptide comprising the1-phenyl-2-mercaptoethyl auxiliary moiety.

What are needed in the art are methods and compositions that permit NCLreactions at any desired amino acid position in a polypeptide. Recently,a general method was developed that makes it possible to geneticallyencode unnatural amino acids in both prokaryotic and eukaryoticorganisms through the use of orthogonal tRNA/orthogonal aminoacyl tRNAsynthetase pairs (reviewed in Wang and Schultz (2006) “Expanding theGenetic Code,” Ann Rev Biophys Biomol Struct 35: 225-249). Thismethodology has been successfully incorporate unnatural amino acids withunique chemical reactivities into proteins in bacteria and/or yeast(Chin, et al. (2002) “Addition of p-azido-L-phenylalanine to the geneticcode of Escherichia coli.” J Am Chem Soc 124: 9026-9027; Deiters, et al.(2004) “Site-specific PEGylation of proteins containing unnatural aminoacids.” Bio Med Chem Lett 14: 5743-5745; Wang, et al. (2003) “Additionof the keto functional group to the genetic code of Escherichia coli.”Proc Natl Acad Sci USA 100: 56-61; Zhang, et al. (2003) “A new strategyfor the site-specific modification of proteins in vivo.” Biochemistry42: 6735-46). There is a need in the art for unnatural amino acids thatcomprise a 1,2 aminothiol functional groups and for orthogonaltranslation components that can incorporate such unnatural amino acidsat defined positions into proteins in living cells. The inventiondescribed herein fulfills these and other needs, as will be apparentupon review of the following disclosure.

SUMMARY OF THE INVENTION

The invention provides systems, methods, compositions, and kits forincorporating unnatural amino acids that comprise a 1,2 aminothiolgroup, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid orother unnatural amino acids described herein, in response to a selectorcodon, e.g., an amber stop codon. These compositions include pairs oforthogonal tRNAs (O-tRNAs) and orthogonal aminoacyl tRNA synthetases(O-RSes) that do not interact with or interfere with the components ofthe translation system in which they are being used. These novelsystems, methods, kits, and compositions permit the production ofpolypeptides comprising translationally incorporated unnatural aminoacids that comprise a 1,2 aminothiol group, e.g., any of the unnaturalamino acids described herein. Polypeptides that comprise such UAAs findparticular use in native chemical ligation (NCL) reactions, where the1,2 aminothiol moiety can readily and specifically react with athioester moiety to form a native peptide bond. In addition, the 1,2aminothiol moiety can readily and specifically react with an aldehydemoiety to form a thiazolidine. These reactions are highlychemoselective, proceed efficiently in aqueous conditions atphysiological pH, and can be usefully applied to conjugate polypeptidesto a wide range of target molecules, described elsewhere herein.Accordingly, compositions, methods, and systems for the site-specificincorporation of amino acids that comprise a 1,2 aminothiol group, e.g.,the unnatural amino acids described herein, are a valuable tool forsite-specific polypeptide modification, as demonstrated herein.

In one aspect, the invention provides translation systems. Thetranslation systems comprise an unnatural amino acid that comprises a1,2 aminothiol group, a first orthogonal aminoacyl-tRNA synthetase(O-RS), and a first orthogonal tRNA (O-tRNA), wherein the first O-RSpreferentially aminoacylates the first O-tRNA with the unnatural aminoacid that comprises a 1,2 aminothiol group. The unnatural amino acidcomprising the 1,2 amino group can be, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid or anyof the other unnatural amino acids discussed herein. In some aspects,the first O-RS preferentially aminoacylates the first O-tRNA with theunnatural amino acid that comprises a 1,2 aminothiol group with anefficiency that is at least 50% of the efficiency observed for atranslation system comprising the first O-tRNA, the unnatural amino acidthat comprises a 1,2 aminothiol group, and an aminoacyl-tRNA synthetasecomprising the amino acid sequence of SEQ ID NO: 1.

The translation systems can use components derived from a variety ofsources. In one embodiment, the O-RS used in the system can comprise anamino acid sequence of SEQ ID NO: 1 or a conservative variant thereof.The conservative variant can comprise a glycine at an amino acidposition corresponding to amino acid 32 of SEQ ID NO: 1, an asparticacid at an amino acid position corresponding to amino acid 65 of SEQ IDNO: 1, an isoleucine at an amino acid position corresponding to aminoacid 70 of SEQ ID NO: 1, a glutamic acid at an amino acid positioncorresponding to amino acid 84 of SEQ ID NO: 1, a threonine at an aminoacid position corresponding to amino acid 108 of SEQ ID NO: 1, atyrosine at an amino acid position corresponding to amino acid 109 ofSEQ ID NO: 1, an arginine at an amino acid position corresponding toamino acid 114 of SEQ ID NO: 1, a glycine at amino acid positioncorresponding to amino acid 158 of SEQ ID NO: 1, a glutamic acid at anamino acid position corresponding to amino acid 162 of SEQ ID NO: 1,and/or a glycine at an amino acid position corresponding to amino acid250 of SEQ ID NO: 1.

In some embodiments, the O-tRNA can be an amber suppressor, an ochresuppressor tRNA, an opal suppressor tRNA, or a tRNA that recognizes afour base codon, a rare codon, or a non-coding codon. In someembodiments, the O-tRNA comprises or is encoded by the polynucleotidesequence of SEQ ID NO: 3.

In some aspects, the translation system optionally comprises a nucleicacid encoding a polypeptide of interest. This nucleic acid comprises atleast one selector codon that is recognized by the O-tRNA. Thepolypeptide of interest encoded by the nucleic acid can comprise aZ-domain, an SH3 domain and/or any of the polypeptide domains discussedherein. The polypeptide of interest encoded by the nucleic acid can behomologous to c-Crk or any of the proteins discussed herein.

In some aspects, the translation system comprises a second orthogonalpair, e.g., a second O-RS and a second O-tRNA that utilize a secondunnatural amino acid, so that the system is now able to incorporate atleast two different unnatural amino acids at different selected sites ina polypeptide. In this embodiment, the second O-RS preferentiallyaminoacylates the second O-tRNA with a second unnatural amino acid thatis different from the first unnatural amino acid, and the second O-tRNArecognizes a selector codon that is different from the selector codonrecognized by the first O-tRNA.

In some embodiments, the translation system comprises a cell, e.g., amammalian, an insect, a yeast, a bacterial, or an E. coli cell. The typeof cell used is not particularly limited, as long as the O-RS and O-tRNAretain their orthogonality in the cell's environment.

Relatedly, the invention also provides methods, which use thetranslation systems described above, to produce polypeptides having oneor more unnatural amino acids that comprise a 1,2 aminothiol group atselected positions. The polypeptides into which such unnatural aminoacids can be incorporated are not particularly limited. The methodsinclude providing a translation system comprising: i) an unnatural aminoacid that comprises a 1,2, aminothiol group, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid or anyone of the other unnatural amino acid described herein, ii) a firstorthogonal aminoacyl-tRNA synthetase (O-RS), iii) a first orthogonaltRNA (O-tRNA) that preferentially aminoacylates the first O-tRNA withthe unnatural amino acid that comprises an aminothiol group, and iv) anucleic acid encoding a polypeptide, wherein the polynucleotidecomprises at least one selector codon that is recognized by the firstO-tRNA. The methods also include incorporating the unnatural amino acidthat comprises a 1,2 aminothiol group at a selected position in thepolypeptide during translation in response to the selector codon. Insome embodiments, the unnatural amino acid is2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid.Providing an O-RS can optionally include providing a nucleic acid thatencodes the O-RS, where the nucleic acid comprises the polynucleotidesequence of SEQ ID NO: 2.

In some variations of these methods, the unnatural amino acid thatcomprises a 1,2 aminothiol group at the selected position in thepolypeptide can be reacted with a moiety comprising an aldehydefunctional group to form a thiazolidine. Optionally, the unnatural aminoacid that comprises a 1,2 aminothiol group can be reacted with a moietycomprising a thioester functional group via native chemical ligation(NCL) to ligate the moiety to the polypeptide at the site of theunnatural amino acid with a peptide bond. The moiety comprising thealdehyde or thioester functional group can optionally be, e.g., a secondamino acid in the polypeptide comprising the unnatural amino acid thatcomprises the 1,2 aminothiol group, a second translationally synthesizedpolypeptide, a second synthetic peptide, a second semi-syntheticpeptide, an oligonucleotide, a DNA, an RNA, a nucleotide analog, anaffinity tag (e.g., biotin, FLAG, hexahistine, etc.), a synthetic drug,a carbohydrate derivative, a fluorophore (e.g., Cascade Blue, Alexa568,Alexa647, etc.), a chromophore (e.g., phytochrome, phycobilin,bilirubin, etc.), a spin label (such as nitroxide), a toxin, a metalchelator (such as nitrilotriacetate), a photocrosslinker (such asp-azidoiodoacetanilide), an NMR probe, an X-ray probe, a pH probe, an IRprobe, a dye, a sugar, a hapten, a cofactor, a fatty acid, a terpene(e.g., geraniol, limonene, farnesol, etc.), a polyethylene glycol (e.g.,a branched PEG, a linear PEG, PEGs of different molecular weights,etc.), a resin, a solid support, or the like. It will be appreciated bythose of skill in the art that the above list should not necessarily betaken as limiting

The invention also provides a variety of compositions, including nucleicacids and proteins. For example, the invention provides polynucleotidesthat encode an O-RS polypeptide that preferentially aminoacylates acognate O-tRNA with an unnatural amino acid that comprises a 1,2aminothiol group, e.g., those described herein. The O-RS polypeptide cancomprise the amino acid sequence of SEQ ID NO: 1 or a conservativevariant thereof. In some aspects, the conservative variant polypeptidecan aminoacylate a cognate O-tRNA with2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid withan efficiency that is at least 50% of the efficiency observed for atranslation system comprising the cognate O-tRNA, the unnatural aminoacid, and an aminoacyl-tRNA synthetase comprising the amino acidsequence of SEQ ID NO: 1.

The invention also provides polynucleotides that encode the O-RSpolypeptides described above. For example, a polynucleotide can comprisethe nucleotide sequence of SEQ ID NO: 2. Vectors and cells that comprisethese polynucleotides are also provided by the invention.

Also provided by the invention are methods of producing an O-RS thatpreferentially aminoacylates an O-tRNA with a2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Themethods include mutating a wild-type aminoacyl-tRNA synthetase andselecting an O-RS mutant that preferentially aminoacylating an O-tRNAwith 2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid.

In addition, the invention provides methods of synthesizing2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Themethods include dissolvingN-(tert-butoxycarbonyl)-S-(triphenylmethyl)cysteine,1-(3-dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride, and1-hydroxybenzotriazole hydrate in anhydrous dimethylformamide to producesolution 1, adding N,N-diisopropylethylamine to solution 1 to producesolution 2, and adding N-(tert-butoxycarbonyl)-4-aminophenylalanine tosolution 2 to produce solution 3. The methods include drying solution 3to produce residue 1, purifying residue 1 to produce solid 1, anddissolving solid 1 in a mixture comprising trifluoroacetic acid,triisopropylsilane, thioanisole and water to produce solution 4. Themethods also include drying solution 4 to produce residue 2 andpurifying residue 2, which comprises the2-amino-3-(4-(2-amino-3-mercaptopropan-amido)phenyl)-propanoic acid.

Producing solution 2 can optionally comprise stirring solution 2 underargon for 5 minutes, producing solution 3 can comprise stirring solution3 for 12 hours, and producing solution 4 can comprise stirring solution4 for 20 minutes. Optionally, drying solution 3 to produce residue 1 anddrying solution 4 to produce residue 2 can comprise vacuuming thesolvent from each solution. Residue 1 can optionally be purified viacolumn chromatography, and residue 2 can optionally be purified via HPLCto produce purified residue 2, which is then lyophilized to produce2-amino-3-(4-(2-amino-3-mercaptopropan-amido)phenyl)-propanoic acid.

Kits are also a feature of the invention. For example, such kits cancomprise components for producing a protein comprising one or moreunnatural amino acids that comprise a 1,2 aminothiol group. Suchcomponents can include, e.g., a nucleic acid comprising a polynucleotidesequence encoding an O-tRNA, a nucleic acid comprising a polynucleotideencoding an O-RS, an O-RS, one or more unnatural amino acids comprisinga 1,2 aminothiol group (such as those described herein), and/or reagentsfor the conjugation of the proteins comprising an unnatural amino acidwith a 1,2 aminothiol group with a moiety comprising an aldehyde orthioester functional group (e.g., including, but not limited to thosemoieties describe above). The kits can optionally include a suitablestrain of E. coli host cells for expression of the O-tRNA/O-RS andproduction of a protein comprising one or more unnatural amino acidsthat comprise a 1,2 aminothiol group. The kits can also includeappropriate reagents and instructions for using polypeptides comprisingunnatural amino acids with a 1,2 aminothiol group. Optionally, the kitcan include reagents and instructions for the synthesis of2-amino-3-(4-(2-amino-3-mercaptopropan-amido)phenyl)-propanoic acid. Inaddition, the kit can include a container to hold the kit componentsand/or instructional materials for practicing the methods herein withthe compositions described above.

Those of skill in the art will appreciate that the methods, kits andcompositions provided by the invention can be used alone or incombination. For example, a translation system of the invention can beused in the methods described herein to produce a polypeptide ofinterest comprising an unnatural amino acid with a 1,2 aminothiol groupat a selected position. Alternately or additionally, these methods canbe used to produce, e.g., proteins conjugated to one or more moieties(e.g., those described above) that comprise a thioester or aldehyde. Oneof skill will appreciate further combinations of the features of theinvention noted herein.

Definitions

Before describing the present invention in detail, it is to beunderstood that this invention is not limited to particular devices orbiological systems, which can, of course, vary. It is also to beunderstood that the terminology used herein is for the purpose ofdescribing particular embodiments only, and is not intended to belimiting. As used in this specification and the appended claims, thesingular forms “a”, “an” and “the” include plural referents unless thecontent clearly dictates otherwise. Thus, for example, reference to “asurface” includes a combination of two or more surfaces; reference to“bacteria” includes mixtures of bacteria, and the like.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which the invention pertains. Although any methods andmaterials similar or equivalent to those described herein can be used inthe practice for testing of the present invention, the preferredmaterials and methods are described herein. In describing and claimingthe present invention, the following terminology will be used inaccordance with the definitions set out below.

Bacteria: As used herein, the terms “bacteria” and “eubacteria” refer toprokaryotic organisms that are distinguishable from Archaea. Similarly,Archaea refers to prokaryotes that are distinguishable from eubacteria.Eubacteria and Archaea can be distinguished by a number morphologicaland biochemical criteria. For example, differences in ribosomal RNAsequences, RNA polymerase structure, the presence or absence of introns,antibiotic sensitivity, the presence or absence of cell wallpeptidoglycans and other cell wall components, the branched versusunbranched structures of membrane lipids, and the presence/absence ofhistones and histone-like proteins are used to assign an organism toEubacteria or Archaea.

Examples of Eubacteria include Escherichia coli, Thermus thermophilus,Bacillus subtilis and Bacillus stearothermophilus. Example of Archaeainclude Methanococcus jannaschii (Mj), Methanosarcina mazei (Mm),Methanobacterium thermoautotrophicum (Mt), Methanococcus maripaludis,Methanopyrus kandleri, Halobacterium such as Haloferax volcanii andHalobacterium species NRC-1, Archaeoglobus fulgidus (Af), Pyrococcusfuriosus (Pf), Pyrococcus horikoshii (Ph), Pyrobaculum aerophilum,Pyrococcus abyssi, Sulfolobus solfataricus (Ss), Sulfolobtis tokodaii,Aeuropyrum pernix (Ap), Thermoplasma acidophilum and Thermoplasmavolcanium.

Cognate: The term “cognate” refers to components that function together,or have some aspect of specificity for each other, e.g., an orthogonaltRNA and an orthogonal aminoacyl-tRNA synthetase.

Conservative variant: As used herein, the term “conservative variant,”in the context of a translation component, refers to a translationcomponent, e.g., a conservative variant O-tRNA or a conservative variantO-RS, that functionally performs similar to a base component that theconservative variant is similar to, e.g., an O-tRNA or O-RS, havingvariations in the sequence as compared to a reference O-tRNA or O-RS.For example, an O-RS, or a conservative variant of that O-RS, willaminoacylate a cognate O-tRNA with an unnatural amino acid thatcomprises a 1,2 aminothiol group, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Inthis example, the O-RS and the conservative variant O-RS do not have thesame amino acid sequences. The conservative variant can have, e.g., onevariation, two variations, three variations, four variations, or five ormore variations in sequence, as long as the conservative variant isstill complementary to, e.g., functions with, the cognate correspondingO-tRNA or O-RS.

In some embodiments, a conservative variant O-RS comprises one or moreconservative amino acid substitutions compared to the O-RS from which itwas derived. In some embodiments, a conservative variant O-RS comprisesone or more conservative amino acid substitutions compared to the O-RSfrom which it was derived, and furthermore, retains O-RS biologicalactivity; for example, a conservative variant O-RS that retains at least10% of the biological activity of the parent O-RS molecule from which itwas derived, or alternatively, at least 20%, at least 30%, or at least40%. In some preferred embodiments, the conservative variant O-RSretains at least 50% of the biological activity of the parent O-RSmolecule from which it was derived. The conservative amino acidsubstitutions of a conservative variant O-RS can occur in any domain ofthe O-RS, including the amino acid binding pocket.

Derived from: As used herein, the term “derived from” refers to acomponent that is isolated from or made using a specified molecule ororganism, or information from the specified molecule or organism. Forexample, a polypeptide that is derived from a second polypeptide caninclude an amino acid sequence that is identical or substantiallysimilar to the amino acid sequence of the second polypeptide. In thecase of polypeptides, the derived species can be obtained by, forexample, naturally occurring mutagenesis, artificial directedmutagenesis or artificial random mutagenesis. The mutagenesis used toderive polypeptides can be intentionally directed or intentionallyrandom, or a mixture of each. The mutagenesis of a polypeptide to createa different polypeptide derived from the first can be a random event,e.g., caused by polymerase infidelity, and the identification of thederived polypeptide can be made by appropriate screening methods, e.g.,as discussed herein. Mutagenesis of a polypeptide typically entailsmanipulation of the polynucleotide that encodes the polypeptide.

Encode: As used herein, the term “encode” refers to any process wherebythe information in a polymeric macromolecule or sequence string is usedto direct the production of a second molecule or sequence string that isdifferent from the first molecule or sequence string. As used herein,the term is used broadly, and can have a variety of applications. Insome aspects, the term “encode” describes the process ofsemi-conservative DNA replication, where one strand of a double-strandedDNA molecule is used as a template to encode a newly synthesizedcomplementary sister strand by a DNA-dependent DNA polymerase. Inanother aspect, the term “encode” refers to any process whereby theinformation in one molecule is used to direct the production of a secondmolecule that has a different chemical nature from the first molecule.For example, a DNA molecule can encode an RNA molecule, e.g., by theprocess of transcription incorporating a DNA-dependent RNA polymeraseenzyme. Also, an RNA molecule can encode a polypeptide, as in theprocess of translation. When used to describe the process oftranslation, the term “encode” also extends to the triplet codon thatencodes an amino acid. In some aspects, an RNA molecule can encode a DNAmolecule, e.g., by the process of reverse transcription incorporating anRNA-dependent DNA polymerase. In another aspect, a DNA molecule canencode a polypeptide, where it is understood that “encode” as used inthat case incorporates both the processes of transcription andtranslation.

Eukaryote: As used herein, the term “eukaryote” refers to organismsbelonging to the Kingdom Eucarya. Eukaryotes are generallydistinguishable from prokaryotes by their typically multicellularorganization (but not exclusively multicellular, for example, yeast),the presence of a membrane-bound nucleus and other membrane-boundorganelles, linear genetic material (i.e., linear chromosomes), theabsence of operons, the presence of introns, message capping and poly-AmRNA, and other biochemical characteristics, such as a distinguishingribosomal structure. Eukaryotic organisms include, for example, animals,e.g., mammals, insects, reptiles, birds, etc., ciliates, plants, e.g.,monocots, dicots, algae, etc., fungi, yeasts, flagellates,microsporidia, protists, etc.

Orthogonal: As used herein, the term “orthogonal” refers to a molecule,e.g., an orthogonal tRNA (O-tRNA) and/or an orthogonal aminoacyl-tRNAsynthetase (O-RS), that functions with endogenous components of a cellwith reduced efficiency as compared to a corresponding molecule that isendogenous to the cell or translation system, or that fails to functionwith endogenous components of the cell. In the context of tRNAs andaminoacyl-tRNA synthetases, orthogonal refers to an inability or reducedefficiency, e.g., less than 20% efficiency, less than 10% efficiency,less than 5% efficiency, or less than 1% efficiency, of an orthogonaltRNA to function with an endogenous tRNA synthetase compared to anendogenous tRNA to function with the endogenous tRNA synthetase, or ofan orthogonal aminoacyl-tRNA synthetase to function with an endogenoustRNA compared to an endogenous tRNA synthetase to function with theendogenous tRNA. The orthogonal molecule lacks a functionally normalendogenous complementary molecule in the cell. For example, anorthogonal tRNA in a cell is aminoacylated by any endogenous RS of thecell with reduced or even zero efficiency, when compared toaminoacylation of an endogenous tRNA by the endogenous RS. In anotherexample, an orthogonal RS aminoacylates any endogenous tRNA a cell ofinterest with reduced or even zero efficiency, as compared toaminoacylation of the endogenous tRNA by an endogenous RS. A secondorthogonal molecule can be introduced into the cell that operablyfunctions with the first orthogonal molecule. For example, an orthogonaltRNA/RS pair includes introduced complementary components that functiontogether in the cell with an efficiency, e.g., 45% efficiency, 50%efficiency, 60% efficiency, 70% efficiency, 75% efficiency, 80%efficiency, 90% efficiency, 95% efficiency, or 99% or more efficiency,as compared to that of a control, e.g., a corresponding tRNA/RSendogenous pair, or an active orthogonal pair, e.g., an orthogonaltRNA/orthogonal RS pair.

Orthogonal aminoacyl-tRNA synthetase: As used herein, an orthogonalaminoacyl-tRNA synthetase (O-RS) is an enzyme that preferentiallyaminoacylates the O-tRNA with an amino acid in a translation system ofinterest. The amino acid that the O-RS loads onto the O-tRNA in thepresent invention is an unnatural amino acid comprising a 1,2 aminothiolgroup, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid or anyof the unnatural amino acids described herein.

Orthogonal tRNA: As used herein, an orthogonal tRNA (O-tRNA) is a tRNAthat is orthogonal to a translation system of interest. The O-tRNA canexist charged with, e.g., an unnatural amino acid that comprises a 1,2aminothiol group, e.g., any of the unnatural amino acids shown in FIG.1, or in an uncharged state. It is also to be understood that an O-tRNAis optionally charged (aminoacylated) by a cognate orthogonalaminoacyl-tRNA synthetase with an unnatural amino acid comprising a 1,2aminothiol group. Indeed, it will be appreciated that the O-tRNA of theinvention is advantageously used to insert an unnatural amino acid thatcomprises a 1,2 aminothiol group, e.g., any of the unnatural amino acidsdescribed herein, into a growing polypeptide, during translation, inresponse to a selector codon.

Preferentially aminoacylates: As used herein in reference to orthogonaltranslation systems, an O-RS “preferentially aminoacylates” a cognateO-tRNA when the O-RS charges the O-tRNA with, e.g., an unnatural aminoacid that comprises a 1,2 aminothiol group, e.g., any of the unnaturalamino acids depicted in FIG. 1, more efficiently than it charges anyendogenous tRNA in an expression system. That is, when the O-tRNA andany given endogenous tRNA are present in a translation system inapproximately equal molar ratios, the O-RS will charge the O-tRNA morefrequently than it will charge the endogenous tRNA. Preferably, therelative ratio of O-tRNA charged by the O-RS to endogenous tRNA chargedby the O-RS is high, preferably resulting in the O-RS charging theO-tRNA exclusively, or nearly exclusively, when the O-tRNA andendogenous tRNA are present in equal molar concentrations in thetranslation system. The relative ratio between O-tRNA and endogenoustRNA that is charged by the O-RS, when the O-tRNA and O-RS are presentat equal molar concentrations, is greater than 1:1, preferably at leastabout 2:1, more preferably 5:1, still more preferably 10:1, yet morepreferably 20:1, still more preferably 50:1, yet more preferably 75:1,still more preferably 95:1, 98:1, 99:1, 100:1, 500:1, 1,000:1, 5,000:1or higher.

The O-RS “preferentially aminoacylates an O-tRNA with, e.g., anunnatural amino acid that comprises a 1,2 aminothiol group when (a) theO-RS preferentially aminoacylates the O-tRNA compared to an endogenoustRNA, and (b) where that aminoacylation is specific for e.g., theunnatural amino acid that comprises a 1,2 aminothiol group as comparedto aminoacylation of the O-tRNA by the O-RS with any natural amino acid.That is, when e.g., the unnatural amino acid that comprises a 1,2aminothiol group, e.g., any of the unnatural amino acids describedherein, and natural amino acids are present in equal molar amounts in atranslation system comprising the O-RS and O-tRNA, the O-RS will loadthe O-tRNA with e.g., the unnatural amino acid that comprises a 1,2aminothiol group, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid, morefrequently than with the natural amino acid. Preferably, the relativeratio of O-tRNA charged with e.g., the unnatural amino acid thatcomprises a 1,2 aminothiol group to O-tRNA charged with the naturalamino acid is high. More preferably, O-RS charges the O-tRNAexclusively, or nearly exclusively, with e.g., the unnatural amino acidthat comprises a 1,2 aminothiol group. The relative ratio betweencharging of the O-tRNA with, e.g., the unnatural amino acid thatcomprises a 1,2 aminothiol group and charging of the O-tRNA with anatural amino acid, when both the natural and the unnatural amino acid,e.g., any of the unnatural amino acids depicted in FIG. 1, are presentin the translation system in equal molar concentrations, is greater than1:1, preferably at least about 2:1, more preferably 5:1, still morepreferably 10:1, yet more preferably 20:1, still more preferably 50:1,yet more preferably 75:1, still more preferably 95:1, 98:1, 99:1, 100:1,500:1, 1,000:1, 5,000:1 or higher.

Prokaryote: As used herein, the term “prokaryote” refers to organismsbelonging to the Kingdom Monera (also termed Procarya). Prokaryoticorganisms are generally distinguishable from eukaryotes by theirunicellular organization, asexual reproduction by budding or fission,the lack of a membrane-bound nucleus or other membrane-bound organelles,a circular chromosome, the presence of operons, the absence of introns,message capping and poly-A mRNA, and other biochemical characteristics,such as a distinguishing ribosomal structure. The Procarya includesubkingdoms Eubacteria and Archaea (sometimes termed “Archaebacteria”).Cyanobacteria (the blue green algae) and mycoplasma are sometimes givenseparate classifications under the Kingdom Monera.

Selector codon: The term “selector codon” refers to codons recognized bythe O-tRNA in the translation process and not recognized by anendogenous tRNA. The O-tRNA anticodon loop recognizes the selector codonon the mRNA and incorporates the amino acid with which it is charged,e.g., an unnatural amino acid that comprises a 1,2 aminothiol group,e.g., any of the unnatural amino acids shown in FIG. 1, at this site inthe polypeptide. Selector codons can include, e.g., nonsense codons,such as, stop codons, e.g., amber, ochre, and opal codons; four or morebase codons; rare codons; noncoding codons; and codons derived fromnatural or unnatural base pairs and/or the like.

Translation system: The term “translation system” refers to thecomponents that incorporate an amino acid into a growing polypeptidechain (protein). Components of a translation system can include, e.g.,ribosomes, tRNAs, synthetases, mRNA and the like. The O-tRNA and/or theO-RSs of the invention can be added to or be part of an in vitro or invivo translation system, e.g., in a non-eukaryotic cell, e.g., abacterium, such as E. coli, or in a eukaryotic cell, e.g., a yeast cell,a mammalian cell, a plant cell, an algae cell, a fungus cell, an insectcell, and/or the like.

Unnatural amino acid: As used herein, the term “unnatural amino acid”refers to any amino acid, modified amino acid, and/or amino acidanalogue, that is not one of the 20 common naturally occurring aminoacids. For example, unnatural amino acids that comprise 1,2 aminothiolgroups, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid (seeFIG. 1), find use with the invention.2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid canalso be called 4-(L-cysteinylamino)-L-phenylalanine. However, thisunnatural amino acid will be referred to as2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acidthroughout the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts the chemical structure of2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid andother unnatural amino acids that each comprise a 1,2 aminothiol group.

FIG. 2 depicts a reaction scheme for the synthesis of2-(tert-butoxycarbonylamino)-3-(4-(2-(tert-butoxycarbonylamino)-3-(tritylthio)propanamido)phenyl)propanoicacid.

FIG. 3 depicts a reaction scheme for the synthesis of2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid.

FIG. 4 depicts an SDS PAGE on which purified protein samples derivedfrom cultures grown in the absence or presence2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid wererun.

FIG. 5 depicts the chemical structure of fluorescein-MES-thioester(C₂₃H₁₇O₉S₂ ⁻). Exact Mass: 501.03; Molecular Weight: 501.51; C. 55.08;H. 3.42; O. 28.71; S. 12.97.

FIG. 6 depicts an SDS PAGE on which samples of an NCL reaction mixturecomprising the fluorescein-MES-thioester of FIG. 6 a Z-domain proteinmutant comprising a2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acidresidue at amino acid position 7 were run.

FIG. 7 depicts the SDS PAGE of FIG. 6 under UV light.

FIG. 8 depicts the reaction scheme for the site-specific PEGylation of aZ-domain protein mutant comprising a2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acidresidue at amino acid position 7

FIG. 9 provides various nucleotide and amino acid sequences finding usewith the invention.

FIG. 10 depicts results from experiments performed to ligatePEG-aldehydes to a Z-domain protein mutant comprising a2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acidresidue at amino acid position 7 via native chemical ligation

DETAILED DESCRIPTION

The invention described herein provides methods and compositions for theincorporation of unnatural amino acids that comprise a 1,2 aminothiolgroup (FIG. 1) into polypeptides using orthogonal translation systems.The incorporation of an unnatural amino acid that comprises a 1,2aminothiol group, e.g., any of the unnatural amino acids describedherein, into a polypeptide of interest can be programmed to occur at aydesired position by engineering the polynucleotide encoding thepolypeptide of interest to contain a selector codon. The selector codonsignals the incorporation of an unnatural amino acid that comprises a1,2 aminothiol group, e.g., as shown in FIG. 1, into a specific positionthe primary structure of the growing polypeptide chain.

The novel compositions provided by the invention include a novelorthogonal aminoacyl-tRNA synthetases (O-RS) that have the ability tocharge a suitable cognate suppressor orthogonal tRNA (O-tRNA), e.g., theO-tRNA of SEQ ID NO: 3, with an unnatural amino acid that comprises a1,2 aminothiol group, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Oneexample O-RS is a novel mutant of the Methanococcus jannaschiityrosyl-tRNA synthetase and selectively charges the O-tRNA with, e.g.,an unnatural amino acid that comprises a 1,2 aminothiol group, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid, in atranslation system, e.g., in an E. coli cell. Most preferably, the O-RSand O-tRNA do not substantially cross-react with or interfere with theendogenous translational machinery of the translation system in whichthey are being used, e.g., the endogenous components of thetranslational machinery of, e.g., an E. coli cell or other host cell.The O-RS of the invention can include the O-RS of SEQ ID NO: 1. Theinvention also provides a polynucleotide that encodes this O-RSpolypeptide, e.g., SEQ ID NO: 2.

The novel methods provided by the invention include steps for highlyefficient and site-specific incorporation of, e.g., unnatural aminoacids that comprise a 1,2 aminothiol group, e.g., any of the unnaturalamino acids described herein, into polypeptides, e.g., in vivo, e.g., inan E. coli cell, in response to a selector codon, e.g., the ambernonsense codon TAG. These novel methods, as well as the novelcompositions, can be used in, but are not limited to, a bacterial hostsystem, e.g., E. coli.

The polypeptides into which the unnatural amino acids described herein,e.g., unnatural amino acids that comprise a 1,2 aminothiol group areincorporated can be efficiently modified under physiological conditions,e.g., in vitro and in vivo, in a highly selective fashion in nativechemical ligation (NCL) reactions. In the classical NCL reaction, apeptide comprising an N-terminal cysteine reacts with a moiety, e.g.,another peptide, comprising an a-thioester group, e.g. a C-terminalthioester, in the presence of an exogenous thiol catalyst to yield anative peptide bond at the site of ligation (Dawson, et al. (1994)“Synthesis of Proteins by Native Chemical Ligation.” Science 266:776-779). The invention expands the utility of NCL by permitting theplacement of the reactive 1,2 aminothiol at any amino acid position inan expressed polypeptide of interest, thus permitting the attachment of,e.g., a second amino acid in the polypeptide comprising the unnaturalamino acid with the 1,2 aminothiol group, a second translationallysynthesized polypeptide, a second synthetic peptide, a secondsemi-synthetic peptide, an oligonucleotide, a DNA, an RNA, a nucleotideanalog, an affinity tag (e.g., biotin, FLAG, hexahistine, etc.), asynthetic drug, a carbohydrate derivative, a fluorophore (e.g., CascadeBlue, Alexa568, Alexa647, etc.), a chromophore (e.g., phytochrome,phycobilin, bilirubin, etc.), a spin label (such as nitroxide), a toxin,a metal chelator (such as nitrilotriacetate), a photocrosslinker (suchas p-azidoiodoacetanilide), an NMR probe, an X-ray probe, a pH probe, anIR probe, a dye, a sugar, a hapten, a cofactor, a fatty acid, a terpene(e.g., geraniol, limonene, farnesol, etc.), a polyethylene glycol (e.g.,a branched PEG, a linear PEG, PEGs of different molecular weights,etc.), a resin, a solid support, or the like, to an expressedpolypeptide at any desired amino acid position. It will be appreciatedby one of skill in the art that the list above should not be taken aslimiting. Additionally or alternatively, an unnatural amino acidcomprising a 1,2 aminothiol group that has been incorporated into apolypeptide of interest can be reacted with a moiety comprising analdehyde, e.g., any one or more of the moieties described above, toconjugate the polypeptide to the moiety via a thiazolidine.

Orthogonal tRNA/Aminoacyl-tRNA Synthetase Technology

An understanding of the novel compositions and methods of the presentinvention is further developed through an understanding of theactivities associated with orthogonal tRNA and orthogonal aminoacyl-tRNAsynthetase pairs. In order to add additional unnatural amino acids thatcomprise a 1,2 aminothiol group to the genetic code, new orthogonalpairs comprising an aminoacyl-tRNA synthetase and a suitable tRNA areneeded that can function efficiently in the host translationalmachinery, but that are “orthogonal” to the translation system at issue,meaning that O-RS/O-tRNA pair functions independently of the synthetasesand tRNAs endogenous to the translation system. Desired characteristicsof the orthogonal pair include a tRNA that decodes or recognizes only aspecific codon, e.g., a selector codon, e.g., an amber stop codon, thatis not decoded by any endogenous tRNA, and an aminoacyl-tRNA synthetasethat preferentially aminoacylates, or “charges”, its cognate tRNA withonly one specific unnatural amino acid. The O-tRNA is also not typicallyaminoacylated, or is poorly aminoacylated, i.e., charged, by endogenoussynthetases. For example, in an E. coli host system, an orthogonal pairwill include an aminoacyl-tRNA synthetase that does not cross-react withany of the endogenous tRNA, e.g., of which there are 40 endogenous in E.coli, and an orthogonal tRNA that is not aminoacylated by any of theendogenous synthetases, e.g., of which there are 21 in E. coli.

The general principles of orthogonal translation systems that aresuitable for making proteins that comprise one or more unnatural aminoacid are known in the art, as are the general methods for producingorthogonal translation systems. For example, see InternationalPublication Numbers WO 2002/086075, entitled “METHODS AND COMPOSITIONFOR THE PRODUCTION OF ORTHOGONAL tRNA-AMINOACYL-tRNA SYNTHETASE PAIRS;”WO 2002/085923, entitled “IN VIVO INCORPORATION OF UNNATURAL AMINOACIDS;” WO 2004/094593, entitled “EXPANDING THE EUKARYOTIC GENETICCODE;” WO 2005/019415, filed Jul. 7, 2004; WO 2005/007870, filed Jul. 7,2004; WO 2005/007624, filed Jul. 7, 2004; WO 2006/110182, filed Oct. 27,2005, entitled “ORTHOGONAL TRANSLATION COMPONENTS FOR THE VIVOINCORPORATION OF UNNATURAL AMINO ACIDS” and WO 2007/103490, filed Mar.7, 2007, entitled “SYSTEMS FOR THE EXPRESSION OF ORTHOGONAL TRANSLATIONCOMPONENTS IN EUBACTERIAL HOST CELLS.” Each of these applications isincorporated herein by reference in its entirety. See also, e.g., Liu,et al. (2007) “Genetic incorporation of unnatural amino acids intoproteins in mammalian cells” Nat Methods 4:239-244; and WO2006/110182entitled “Orthogonal Translation Components for the In vivoIncorporation of Unnatural Amino Acids,” filed Oct. 27, 2005. Fordiscussion of orthogonal translation systems that incorporate unnaturalamino acids, and methods for their production and use, see also, Wangand Schultz, (2005) “Expanding the Genetic Code.” Angewandte Chemie IntEd 44: 34-66; Xie and Schultz, (2005) “An Expanding Genetic Code.”Methods 36: 227-238; Xie and Schultz, (2005) “Adding Amino Acids to theGenetic Repertoire.” Curr Opinion in Chemical Biology 9: 548-554; andWang et al., (2006) “Expanding the Genetic Code.” Annu Rev BiophysBiomol Struct 35: 225-249; Deiters, et al, (2005) “In vivo incorporationof an alkyne into proteins in Escherichia coli.” Bioorganic & MedicinalChemistry Letters 15:1521-1524; Chin et al., (2002) “Addition ofp-Azido-L-phenylalanine to the Genetic Code of Escherichia coli.” J AmChem Soc 124: 9026-9027; and International Publication No.WO2006/034332, filed on Sep. 20, 2005, the contents of each of which areincorporated by reference in their entirety. Additional details arefound in U.S. Pat. No. 7,045,337; U.S. Pat. No. 7,083,970; U.S. Pat. No.7,238,510; U.S. Pat. No. 7,129,333; U.S. Pat. No. 7,262,040; U.S. Pat.No. 7,183,082; U.S. Pat. No. 7,199,222; and U.S. Pat. No. 7,217,809.

Orthogonal Translation Systems

Orthogonal translation systems generally comprise cells, e.g., hostcells such as E. coli, that include an orthogonal tRNA (O-tRNA), anorthogonal aminoacyl tRNA synthetase (O-RS), and an unnatural aminoacid, e.g., an unnatural amino acid that comprises a 1,2 aminothiolgroup, e.g., those depicted in FIG. 1, wherein the O-RS aminoacylatesthe O-tRNA with the unnatural amino acid that comprises a 1,2 aminothiolgroup, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Anorthogonal pair of the invention can include an O-tRNA, e.g., asuppressor tRNA, a frameshift tRNA, or the like, and a cognate O-RS. Theorthogonal systems of the invention, which typically include O-tRNA/O-RSpairs, can comprise a cell or a cell-free environment. In addition tomulti-component systems, the invention also provides novel individualcomponents, for example, a novel orthogonal aminoacyl-tRNA synthetasepolypeptide, e.g., SEQ ID NO: 1, and the polynucleotide that encodesthat polypeptide, e.g., SEQ ID NO: 2.

In general, when an orthogonal pair recognizes a selector codon andloads an amino acid in response to the selector codon, the orthogonalpair is said to “suppress” the selector codon. That is, a selector codonthat is not recognized by the translation system's, e.g., the E. colicell's, endogenous machinery is not ordinarily charged, which results inblocking production of a polypeptide that would otherwise be translatedfrom the nucleic acid. In an orthogonal pair system, the O-RSaminoacylates the O-tRNA with a specific unnatural amino acid, e.g., anunnatural amino acid that comprises a 1,2 aminothiol group, e.g., any ofthe unnatural amino acids described herein. The charged O-tRNArecognizes the selector codon and suppresses the translational blockcaused by the selector codon.

In some aspects, an O-tRNA of the invention recognizes a selector codonand includes at least about, e.g., a 45%, a 50%, a 60%, a 75%, a 80%, ora 90% or more suppression efficiency in the presence of a cognatesynthetase in response to a selector codon as compared to thesuppression efficiency of an O-tRNA comprising or encoded by apolynucleotide sequence as set forth in the sequence listing herein.

In some embodiments, the suppression efficiency of the O-RS and theO-tRNA together is about, e.g., 5 fold, 10 fold, 15 fold, 20 fold, or 25fold or more greater than the suppression efficiency of the O-tRNAlacking the O-RS. In some aspect, the suppression efficiency of the O-RSand the O-tRNA together is at least about, e.g., 35%, 40%, 45%, 50%,60%, 75%, 80%, or 90% or more of the suppression efficiency of anorthogonal synthetase pair as set forth in the sequence listings herein.

The translation system, e.g., an E. coli cell, uses the O-tRNA/O-RS pairto incorporate the unnatural amino acid that comprises a 1,2 aminothiolgroup into a growing polypeptide chain, e.g., via a nucleic acid thatcomprises a polynucleotide that encodes a polypeptide of interest, wherethe polynucleotide comprises a selector codon that is recognized by theO-tRNA. In certain preferred aspects, the cell can include one or moreadditional O-tRNA/O-RS pairs, where the additional O-tRNA is loaded bythe additional O-RS with a different unnatural amino acid. For example,one of the O-tRNAs can recognize a four base codon and the other O-tRNAcan recognize a stop codon. Alternately, multiple different stop codons,multiple different four base codons, multiple different rare codonsand/or multiple different non-coding codons can be used in the samecoding nucleic acid. For further details regarding available O-RS/O-tRNAcognate pairs and their use, see, e.g., the references noted above.

As noted, in some embodiments, there exist multiple O-tRNA/O-RS pairs intranslation system, which allow incorporation of more than one unnaturalamino acid into a polypeptide. For example, the translation system canfurther include an additional different O-tRNA/O-RS pair and a secondunnatural amino acid, where this additional O-tRNA recognizes a secondselector codon and this additional O-RS preferentially aminoacylates theO-tRNA with the second unnatural amino acid. For example, a cell thatincludes an O-tRNA/O-RS pair, where the O-tRNA recognizes, e.g., anamber selector codon, can further comprise a second orthogonal pair,where the second O-tRNA recognizes a different selector codon, e.g., anopal codon, an ochre codon, a four-base codon, a rare codon, anon-coding codon, or the like. Desirably, the different orthogonal pairsare derived from different sources, which can facilitate recognition ofdifferent selector codons.

In certain embodiments, translation systems can comprise a cell, such asan E. coli cell, that includes an orthogonal tRNA (O-tRNA), anorthogonal aminoacyl-tRNA synthetase (O-RS), an unnatural amino acidthat comprises a 1,2 aminothiol group, e.g., any of the unnatural aminoacids shown in FIG. 1, and a nucleic acid that comprises apolynucleotide that encodes a polypeptide of interest, where thepolynucleotide comprises the selector codon that is recognized by theO-tRNA. Although orthogonal translation systems, e.g., translationsystems comprising an O-RS, an O-tRNA and an unnatural amino acid thatcomprises a 1,2 aminothiol group, can utilize cultured cells to produceproteins having unnatural amino acids, it is not intended that anorthogonal translation system of the invention require an intact, viablecell. For example, a orthogonal translation system can utilize acell-free system in the presence of a cell extract. Indeed, the use ofcell free, in vitro transcription/translation systems for proteinproduction is a well established technique. Adaptation of these in vitrosystems to produce proteins having unnatural amino acids usingorthogonal translation system components described herein is well withinthe scope of the invention.

The O-tRNA and/or the O-RS can be naturally occurring or can be, e.g.,derived by mutation of a naturally occurring tRNA and/or RS, e.g., bygenerating libraries of tRNAs and/or libraries of RSs, from any of avariety of organisms and/or by using any of a variety of availablemutation strategies. For example, one strategy for producing anorthogonal tRNA/aminoacyl-tRNA synthetase pair involves importing atRNA/synthetase pair that is heterologous to the system in which thepair will function from a source, or multiple sources, other than thetranslation system in which the tRNA/synthetase pair will be used. Theproperties of the heterologous synthetase candidate include, e.g., thatit does not charge any host cell tRNA, and the properties of theheterologous tRNA candidate include, e.g., that it is not aminoacylatedby any host cell synthetase. In addition, the heterologous tRNA isorthogonal to all host cell synthetases. A second strategy forgenerating an orthogonal pair involves generating mutant libraries fromwhich to screen and/or select an O-tRNA or O-RS. These strategies canalso be combined.

Orthogonal tRNA (O-tRNA)

An orthogonal tRNA (O-tRNA) of the invention desirably mediatesincorporation of an unnatural amino acid into a protein that is encodedby a polynucleotide that comprises a selector codon that is recognizedby the O-tRNA, e.g., in vivo or in vitro. In certain embodiments, anO-tRNA of the invention includes at least about, e.g., a 45%, a 50%, a60%, a 75%, a 80%, or a 90% or more suppression efficiency in thepresence of a cognate synthetase in response to a selector codon ascompared to an O-tRNA comprising or encoded by a polynucleotide sequenceas set forth in the O-tRNA sequences in the sequence listing herein.

Examples of O-tRNAs of the invention are set forth in the sequencelisting herein, for example, see FIG. 9 and SEQ ID NO: 3. The disclosureherein also provides guidance for the design of additional functionallysimilar O-tRNA species. In an RNA molecule, such as an O-RS mRNA, orO-tRNA molecule, Thymine (T) is replaced with Uracil (U) relative to agiven sequence (or vice versa for a coding DNA), or complement thereof.Additional modifications to the bases can also be present to generatesimilar functionally equivalent molecules.

The invention also encompasses conservative variations of O-tRNAscorresponding to particular O-tRNAs herein. For example, conservativevariations of O-tRNA include those molecules that function like theparticular O-tRNAs, e.g., as in the sequence listing herein and thatmaintain the tRNA L-shaped structure by virtue of appropriateself-complementarity, but that do not have a sequence identical to that,e.g., in the sequence listing or FIG. 9, and desirably, are other thanwild type tRNA molecules.

The composition comprising an O-tRNA can further include an orthogonalaminoacyl-tRNA synthetase (O-RS), where the O-RS preferentiallyaminoacylates the O-tRNA with an unnatural amino acid. In certainembodiments, a composition including an O-tRNA can further include atranslation system, e.g., in vitro or in vivo. A nucleic acid thatcomprises a polynucleotide that encodes a polypeptide of interest, wherethe polynucleotide comprises a selector codon that is recognized by theO-tRNA, or a combination of one or more of these can also be present inthe cell.

Methods for producing a recombinant orthogonal tRNA and screening itsefficiency with respect to incorporating an unnatural amino acid into apolypeptide in response to a selector codon can be found, e.g., inInternational Application Publications WO 2002/086075, entitled “METHODSAND COMPOSITIONS FOR THE PRODUCTION OF ORTHOGONAL tRNA AMINOACYL-tRNASYNTHETASE PAIRS;” WO 2004/094593, entitled “EXPANDING THE EUKARYOTICGENETIC CODE;” and WO 2005/019415, filed Jul. 7, 2004. See also Forster,et al., (2003) “Programming peptidomimetic synthetases by translatinggenetic codes designed de novo.” Proc Natl Acad Sci USA 100: 6353-6357;and Feng, et al., (2003) “Expanding tRNA recognition of a tRNAsynthetase by a single amino acid change.” Proc Natl Acad Sci USA 100:5676-5681. Additional details are found in U.S. Pat. No. 7,045,337; U.S.Pat. No. 7,083,970; U.S. Pat. No. 7,238,510; U.S. Pat. No. 7,129,333;U.S. Pat. No. 7,262,040; U.S. Pat. No. 7,183,082; U.S. Pat. No.7,199,222; and U.S. Pat. No. 7,217,809.

Orthogonal Aminoacyl-tRNA Synthetase (O-RS)

The O-RS of the invention preferentially aminoacylates an O-tRNA with anunnatural amino acid that comprises a 1,2 aminothiol group, e.g.,2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid, invitro or in vivo. The O-RS of the invention can be provided to thetranslation system, e.g., an E. coli cell, by a polypeptide thatincludes an O-RS and/or by a polynucleotide that encodes an O-RS or aportion thereof. For example, an example O-RS comprises an amino acidsequence as set forth in SEQ ID NO: 1, or a conservative variationthereof. In another example, an O-RS, or a portion thereof, is encodedby a polynucleotide sequence that encodes an amino acid comprisingsequence in the sequence listing or examples herein, or a complementarypolynucleotide sequence thereof. See, e.g., the polynucleotide of SEQ IDNO: 2.

General details for producing an O-RS, assaying its aminoacylationefficiency, and/or altering its substrate specificity can be found inInternal Publication Number WO 2002/086075, entitled “METHODS ANDCOMPOSITIONS FOR THE PRODUCTION OF ORTHOGONAL tRNA AMINOACYL-tRNASYNTHETASE PAIRS;” and WO 2004/094593, entitled “EXPANDING THEEUKARYOTIC GENETIC CODE.” See also, Wang and Schultz “Expanding theGenetic Code,” Angewandte Chemie Int Ed 44: 34-66 (2005); and Hoben andSoll (1985) Methods Enzymol 113: 55-59, the contents of which areincorporated by reference in their entirety. Additional details arefound in U.S. Pat. No. 7,045,337; U.S. Pat. No. 7,083,970; U.S. Pat. No.7,238,510; U.S. Pat. No. 7,129,333; U.S. Pat. No. 7,262,040; U.S. Pat.No. 7,183,082; U.S. Pat. No. 7,199,222; and U.S. Pat. No. 7,217,809.Methods are also elaborated below.

Methods for identifying an orthogonal aminoacyl-tRNA synthetase (O-RS),e.g., an O-RS that preferentially aminoacylates a cognate O-tRNA with anunnatural amino acid that comprises a 1,2 aminothiol group, e.g., any ofthe unnatural amino acids shown in FIG. 1, are described in thisdisclosure (see the Example). For example, a method includes subjectinga population of cells to a positive and a negative selection. Each cellin the population comprises a member of a plurality of aminoacyl-tRNAsynthetases (RSs). The plurality of RSs can include mutant RSs, RSsderived from a different species, e.g., a species other than that of theaforementioned cells, or both mutant RSs and RSs derived from adifferent species. Each cell in the population also comprises theorthogonal tRNA (O-tRNA), e.g., that can be derived from the samespecies as or a different species than that of the plurality of RSs.Each cell also comprises a polynucleotide that encodes a selectionmarker, e.g., a positive selection marker, and comprises at least oneselector codon.

Cells are selected or screened for those that show an enhancement insuppression efficiency compared to cells lacking or comprising a reducedamount of the member of the plurality of RSs. Suppression efficiency canbe measured by techniques known in the art and by techniques describedin Xie, et al. (2005) “An expanding genetic code.” Methods 36: 227-238.Cells having an enhancement in suppression efficiency each comprise anactive RS that can aminoacylate the O-tRNA with an unnatural amino acidcomprising a 1,2 aminothiol group. The level of aminoacylation can bedetermined by a detectable substance, e.g., a labeled unnatural aminoacid. An O-RS, identified by the method, is also a feature of theinvention.

Any of a number of assays can be used to determine aminoacylation. Theseassays can be performed in vitro or in vivo. For example, in vitroaminoacylation assays are described in, e.g., Hoben and Soll (1985)Methods Enzymol. 113: 55-59. Aminoacylation can also be determined byusing a reporter along with orthogonal translation components anddetecting the reporter in a cell expressing a polynucleotide comprisingat least one selector codon that encodes a protein. See also, WO2002/085923, entitled “IN VIVO INCORPORATION OF UNNATURAL AMINO ACIDS;”and WO 2004/094593, entitled “EXPANDING THE EUKARYOTIC GENETIC CODE.”

Identified O-RSs, e.g., O-RSs capable of aminoacylating a cognate O-tRNAwith an unnatural amino acid that comprises a 1,2 aminothiol group, canbe further manipulated to alter their substrate specificities so thatonly a desired unnatural amino acid, but not any of the 20 natural,e.g., proteinogenic, amino acids are charged to the O-tRNA. Methods togenerate an orthogonal aminoacyl-tRNA synthetase with a substratespecificity for an unnatural amino acid comprising a 1,2 aminothiolgroup include mutating the synthetase, e.g., at the active site in thesynthetase, at the editing mechanism site in the synthetase, atdifferent sites by combining different domains of synthetases, or thelike, and applying a selection process in which a positive selection isfollowed by a negative selection. In the positive selection, suppressionof the selector codon introduced at a nonessential position(s) of apositive marker allows cells to survive under positive selectionpressure. In the presence of both natural and unnatural amino acids,survivors thus encode active synthetases charging the orthogonalsuppressor tRNA with either a natural or unnatural amino acid, e.g., anunnatural amino acid comprising a 1,2 aminothiol group. In the negativeselection, suppression of a selector codon introduced at a nonessentialposition(s) of a negative marker removes synthetases with specificitiesfor natural amino acids. Survivors of the negative and positiveselection encode synthetases that aminoacylate (charge) the orthogonalsuppressor tRNA with unnatural amino acids only. These synthetases canthen be subjected to further mutagenesis, e.g., DNA shuffling or otherrecursive mutagenesis methods, and iterative rounds of positive andnegative selection.

A library of mutant O-RSs can be generated using various mutagenesistechniques known in the art. For example, the mutant RSs can begenerated by site-specific mutations, random point mutations, homologousrecombination, DNA shuffling or other recursive mutagenesis methods,chimeric construction or any combination thereof. For example, a libraryof mutant RSs can be produced from two or more other, e.g., smaller,less diverse “sub-libraries.” It should be noted that libraries of tRNAsynthetases from various organism (e.g., microorganisms such aseubacteria or archaebacteria) such as libraries that comprise naturaldiversity (see, e.g., U.S. Pat. No. 6,238,884 to Short et al; U.S. Pat.No. 5,756,316 to Schallenberger et al; U.S. Pat. No. 5,783,431 toPetersen et al; U.S. Pat. No. 5,824,485 to Thompson et al; U.S. Pat. No.5,958,672 to Short et al), are optionally constructed and screened fororthogonal pairs.

Once the synthetases are subject to the positive and negativeselection/screening strategy, these synthetases can then be subjected tofurther mutagenesis. For example, a nucleic acid that encodes the O-RScan be isolated, and a set of polynucleotides that encode mutated O-RSs,e.g., generated by random mutagenesis, site-specific mutagenesis,recombination or any combination thereof, can be generated from thenucleic acid. These individual steps, or a combination of these steps,can be repeated until a mutated O-RS is obtained that preferentiallyaminoacylates the O-tRNA with an unnatural amino acid that comprises a1,2 aminothiol group, e.g., any of the unnatural amino acids shown inFIG. 1. In some aspects of the invention, the steps are performedmultiple times, e.g., at least two times.

Additional levels of selection/screening stringency can also be used inthe methods of the invention, for producing O-tRNA, O-RS, or pairsthereof. The selection or screening stringency can be varied on one orboth steps of the method to produce an O-RS. This could include, e.g.,varying the amount of selection/screening agent that is used, etc.Additional rounds of positive and/or negative selections can also beperformed. Selecting or screening can also comprise one or more of achange in amino acid permeability, a change in translation efficiency, achange in translational fidelity, etc. Typically, the one or more changeis based upon a mutation in one or more gene in an organism in which anorthogonal tRNA-tRNA synthetase pair is used to produce protein.

Source and Host Organisms

The orthogonal translational components (O-tRNA and O-RS) of theinvention can be derived from any organism, or a combination oforganisms, for use in a host translation system from any other species,with the caveat that the O-tRNA/O-RS components and the host system workin an orthogonal manner. It is not a requirement that the O-tRNA and theO-RS from an orthogonal pair be derived from the same organism. In someaspects, the orthogonal components are derived from archaebacterialgenes for use in a eubacterial host system.

For example, the orthogonal O-tRNA can be derived from anarchaebacterium, such as Methanococcus jannaschii, Methanobacteriumthermoautotrophicum, Halobacterium such as Haloferax volcanii andHalobacterium species NRC-1, Archaeoglobus fulgidus, Pyrococcusfuriosus, Pyrococcus horikoshii, Aeuropyrum pernix, Methanococcusmaripaludis, Methanopyrus kandleri, Methanosarcina mazei (Mm),Pyrobaculum aerophilum, Pyrococcus abyssi, Sulfolobus solfataricus (Ss),Sulfolobus tokodaii, Thermoplasma acidophilum, Thermoplasma volcanium,or the like, or a eubacterium, such as Escherichia coli, Thermusthermophilus, Bacillus subtilis, Bacillus stearothermphilus, or thelike, while the orthogonal O-RS can be derived from an organism orcombination of organisms, e.g., an archaebacterium, such asMethanococcus jannaschii, Methanobacterium thermoautotrophicum,Halobacterium such as Haloferax volcanii and Halobacterium speciesNRC-1, Archaeoglobus fulgidus, Pyrococcus furiosus, Pyrococcushorikoshii, Aeuropyrum pernix, Methanococcus maripaludis, Methanopyruskandleri, Methanosarcina mazei, Pyrobaculum aerophilum, Pyrococcusabyssi, Sulfolobus solfataricus, Sulfolobus tokodaii, Thermoplasmaacidophilum, Thermoplasma volcanium, or the like, or a eubacterium, suchas Escherichia coli, Thermus thermophilus, Bacillus subtilis, Bacillusstearothermphilus, or the like. In one embodiment, eukaryotic sources,e.g., plants, algae, protists, fungi, yeasts, animals, e.g., mammals,insects, arthropods, or the like can also be used as sources of O-tRNAsand O-RSs.

The individual components of an O-tRNA/O-RS pair can be derived from thesame organism or different organisms. In one embodiment, the O-tRNA/O-RSpair is from the same organism. Alternatively, the O-tRNA and the O-RSof the O-tRNA/O-RS pair are from different organisms.

The O-tRNA, O-RS or O-tRNA/O-RS pair can be selected or screened in vivoor in vitro and/or used in a cell, e.g., a eubacterial cell, to producea polypeptide with an unnatural amino acid. The eubacterial cell used isnot limited, for example, Escherichia coli, Thermus thermophilus,Bacillus subtilis, Bacillus stearothermphilus, or the like. Compositionsof eubacterial cells comprising translational components of theinvention are also a feature of the invention.

See also, International Application Publication Number WO 2004/094593,entitled “EXPANDING THE EUKARYOTIC GENETIC CODE,” filed Apr. 16, 2004,for screening O-tRNA and/or O-RS in one species for use in anotherspecies. Additional details are found in Wang and Schultz, (2005)“Expanding the Genetic Code.” Angewandte Chemie Int Ed 44: 34-66; Xieand Schultz, (2005) “An Expanding Genetic Code.” Methods 36: 227-238;Xie and Schultz, (2005) “Adding Amino Acids to the Genetic Repertoire.”Curr Opinion in Chemical Biology 9: 548-554; and Wang et al., (2006)“Expanding the Genetic Code.” Annu Rev Biophys Biomol Struct 35:225-249, and U.S. Pat. No. 7,045,337; U.S. Pat. No. 7,083,970; U.S. Pat.No. 7,238,510; U.S. Pat. No. 7,129,333; U.S. Pat. No. 7,262,040; U.S.Pat. No. 7,183,082; U.S. Pat. No. 7,199,222; and U.S. Pat. No.7,217,809.

Selector Codons

Selector codons of the invention expand the genetic codon framework ofprotein biosynthetic machinery. For example, a selector codon includes,e.g., a unique three base codon, a nonsense codon, such as a stop codon,e.g., an amber codon (UAG), or an opal codon (UGA), an unnatural codon,at least a four base codon, a rare codon, or the like. A number ofselector codons can be introduced into a desired gene, e.g., one ormore, two or more, more than three, etc. Conventional site-directedmutagenesis can be used to introduce the selector codon at the site ofinterest in a polynucleotide encoding a polypeptide of interest. See,e.g., Sayers, J. R., et al. (1988) “5′,3′ Exonuclease inphosphorothioate-based oligonucleotide-directed mutagenesis.” Nucl AcidRes 16: 791-802. By using different selector codons, multiple orthogonaltRNA/synthetase pairs can be used that allow the simultaneoussite-specific incorporation of multiple unnatural amino acids e.g.,including at least one unnatural amino acid, using these differentselector codons.

Unnatural amino acids can also be encoded with rare codons. For example,when the arginine concentration in an in vitro protein synthesisreaction is reduced, the rare arginine codon, AGG, has proven to beefficient for insertion of Ala by a synthetic tRNA acylated withalanine. See, e.g., Ma, C. et al., (1993) “In vitro protein engineeringusing synthetic tRNA^(Ala) with different anticodons.” Biochemistry 32:7939-7945. In this case, the synthetic tRNA competes with the naturallyoccurring tRNA^(Arg), which exists as a minor species in Escherichiacoli. In addition, some organisms do not use all triplet codons. Anunassigned codon AGA in Micrococcus luteus has been utilized forinsertion of amino acids in an in vitro transcription/translationextract. See, e.g., Kowal and Oliver, (1997) “Exploiting unassignedcodons in Micrococcus luteus for tRNA-based amino acid mutagenesis.”Nucl Acid Res 25: 4685-4689. Components of the invention can begenerated to use these rare codons in vivo.

Selector codons can also comprise extended codons, e.g., four or morebase codons, such as, four, five, six or more base codons. Examples offour base codons include, e.g., AGGA, CUAG, UAGA, CCCU, and the like.Examples of five base codons include, e.g., AGGAC, CCCCU, CCCUC, CUAGA,CUACU, UAGGC and the like. Methods of the invention include usingextended codons based on frameshift suppression. Four or more basecodons can insert, e.g., one or multiple unnatural amino acids, into thesame protein. In other embodiments, the anticodon loops can decode,e.g., at least a four-base codon, at least a five-base codon, or atleast a six-base codon or more. Since there are 256 possible four-basecodons, multiple unnatural amino acids can be encoded in the same cellusing a four or more base codon. See also, Anderson, et al., (2002)“Exploring the Limits of Codon and Anticodon Size.” Chemistry andBiology 9: 237-244; Magliery, et al., (2001) “Expanding the GeneticCode: Selection of Efficient Suppressors of Four-base Codons andIdentification of “Shifty” Four-base Codons with a Library Approach inEscherichia coli.” J Mol Biol 307: 755-769; Ma, C., et al., (1993) “Invitro protein engineering using synthetic tRNA^(Ala) with differentanticodons.” Biochemistry 32:7939; Hohsaka, et al., (1999) “EfficientIncorporation of Nonnatural Amino Acids with Large Aromatic Groups intoStreptavidin in In Vitro Protein Synthesizing Systems.” J Am Chem Soc121: 34-40; and Moore, et al., (2000) “Quadruplet Codons: Implicationsfor Code Expansion and the Specification of Translation Step Size.” JMol Biol 298: 195-209. Four base codons have been used as selectorcodons in a variety of orthogonal systems. See, e.g., WO 2005/019415; WO2005/007870 and WO 2005/07624. See also, Wang and Schultz, (2005)“Expanding the Genetic Code.” Angewandte Chemie Int Ed 44: 34-66, thecontent of which is incorporated by reference in its entirety.

For a given system, a selector codon can also include one of the naturalthree base codons, where the endogenous system does not use (or rarelyuses) the natural base codon. For example, this includes a system thatis lacking a tRNA that recognizes the natural three base codon, and/or asystem where the three base codon is a rare codon.

Selector codons optionally include unnatural base pairs. Descriptions ofunnatural base pairs that can be adapted for methods and compositionsinclude, e.g., Hirao, et al., (2002) “An unnatural base pair forincorporating amino acid analogues into protein.” Nature Biotechnology20: 177-182. See also Wu, et al., (2002) “Enzymatic Phosphorylation ofUnnatural Nucleosides.” J Am Chem Soc 124: 14626-14630.

Nucleic Acid and Polypeptide Sequences and Variants

As described herein, the invention provides for polynucleotide sequencesencoding, e.g., O-tRNAs and O-RSs, and polypeptide amino acid sequences,e.g., O-RSs, and, e.g., compositions, systems and methods comprisingsaid polynucleotide or polypeptide sequences. Examples of saidsequences, e.g., O-tRNA and O-RS amino acid and nucleotide sequences aredisclosed herein (see FIG. 9 and SEQ ID NOs: 1, 2, and 3). However, oneof skill in the art will appreciate that the invention is not limited tothose sequences disclosed herein, e.g., in the Examples and sequencelisting. One of skill will appreciate that the invention also providesmany related sequences with the functions described herein, e.g.,polynucleotides and polypeptides encoding conservative variants of anO-RS disclosed herein.

Owing to the degeneracy of the genetic code, “silent substitutions”,i.e., substitutions in a nucleic acid sequence that do not result in analteration in an encoded polypeptide, are an implied feature of everynucleic acid sequence that encodes an amino acid sequence. Similarly,“conservative amino acid substitutions,” where one or a limited numberof amino acids in an amino acid sequence are substituted with differentamino acids with highly similar properties, are also readily identifiedas being highly similar to a disclosed construct. Such conservativevariations of each disclosed sequence are a feature of the presentinvention.

Conservative substitution tables providing functionally similar aminoacids are well known in the art, where one amino acid residue issubstituted for another amino acid residue having similar chemicalproperties (e.g., aromatic side chains or positively charged sidechains), and therefore does not substantially change the functionalproperties of the polypeptide molecule. The following sets forth examplegroups that contain natural amino acids of like chemical properties,where substitutions within a group is a “conservative substitution”.

TABLE 1 Conservative Substitutions Nonpolar and/or Negatively AliphaticPolar, Positively Charged Side Uncharged Aromatic Charged Side SideChains Side Chains Side Chains Chains Chains Glycine SerinePhenylalanine Lysine Aspartate Alanine Threonine Tyrosine ArginineGlutamate Valine Cysteine Tryptophan Histidine Leucine MethionineIsoleucine Asparagine Proline Glutamine

Applications for Proteins and Polypeptides Comprising Unnatural AminoAcids That Comprise a 1,2 Aminothiol Group

Methods and compositions for producing genetically encoded polypeptidescomprising at least one unnatural amino acid that comprises a 1,2aminothiol group, e.g. any of the unnatural amino acids shown in FIG. 1,are a feature of this invention. The unnatural amino acids describedherein can participate in native chemical ligation (NCL) reactions, inwhich a sulfhydryl group of the unnatural amino acid undergoes atransesterification step with an available thioester on a second moietyto form a thioester-linked intermediate which spontaneously rearranges,via an S- to N-acyl shift, to form a peptide bond. An unnatural aminoacids that comprises a 1,2 aminothiol moiety can also be readily reactedwith an aldehyde moiety to form a thiazolidine. Site-specificincorporation of unnatural amino acids comprising 1,2 aminothiol groupsinto expressed polypeptides expands the utility of NCL and/or ofthiazolidine formation by varying the position and number of potentialattachment sites in a polypeptide to which, e.g., a second amino acid inthe polypeptide comprising the unnatural amino acid with a 1,2,aminothiol group, a second translationally synthesized polypeptide, asecond synthetic peptide, a second semi-synthetic peptide, anoligonucleotide, a DNA, an RNA, a nucleotide analog, an affinity tag(e.g., biotin, FLAG, hexahistine, etc.), a synthetic drug, acarbohydrate derivative, a fluorophore (e.g., Cascade Blue, Alexa568,Alexa647, etc.), a chromophore (e.g., phytochrome, phycobilin,bilirubin, etc.), a spin label (such as nitroxide), a toxin, a metalchelator (such as nitrilotriacetate), a photocrosslinker (such asp-azidoiodoacetanilide), an NMR probe, an X-ray probe, a pH probe, an IRprobe, a dye, a sugar, a hapten, a cofactor, a fatty acid, a terpene(e.g., geraniol, limonene, farnesol, etc.), a polyethylene glycol (e.g.,a branched PEG, a linear PEG, PEGs of different molecular weights,etc.), a resin, a solid support, or the like, can be bound. Themolecular entities comprising a thioester to which an unnatural aminoacid comprising a 1,2 aminothiol group can be ligated via NCL, and/orthe moieties comprising an aldehyde to which an unnatural amino acidcomprising a 1,2 aminothiol group can be conjugated via a thiazolidinebond, are well known to those of skill in the art and are notparticularly limiting, provided that they comprise a functional groupthat can participate in an NCL reaction with an aminothiol group, orreact with an aldehyde group to form a thiazolidine. The expandedversatility of these reactions can be particularly useful in a myriad ofapplications in biotechnology, biomedical research, and chemicalbiology, including the study of protein structure, enzyme mechanism,protein-protein interactions, etc. It will be appreciated by those ofskill in the art that the applications for polypeptides and proteinscomprising an unnatural amino acid with a 1,2 aminothiol group that aredescribed below are not to be taken as limiting.

Protein cyclization can provide insights into the importance of aprotein's N- and C-termini in, e.g., protein folding and structuralstability, can increase or prolong a protein's activity, and can protecta protein against proteolytic cleavage. Moreover, protein cyclizationcan reduce structural flexibility, which may aid the study orcrystallization of proteins that are otherwise unstable in the absenceof, e.g., a binding partner or a ligand. However, many of the syntheticmethodologies developed for the cyclization of small bioactive peptidescannot be easily used in larger proteins. One solution to this problemcan be to incorporate an unnatural amino acid that comprises a 1,2aminothiol group, e.g., a2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid, intoa protein that is to be circularized. By incorporating a secondunnatural amino acid comprising a thioester group or an aldehyde groupinto the same protein, the two functional groups can react, e.g., invivo, via NCL or thiazolidine formation, respectively, to generate acyclic peptide inside a living cell.

Interactions between proteins and other biomolecules, e.g., in livecells, can be captured in vivo using proteins that comprise unnaturalamino acids that can participate in NCL reactions or form thiazolidinebonds, e.g., those UAA described herein. For example, a proteincomprising such an unnatural amino acid, can be ligated in asite-specific manner via NCL, e.g., in vivo, to a crosslinker orphotocrosslinker that comprises a thioester group, or via thiazolidineformation, e.g. to a crosslinker or photocrosslinker that comprises analdehyde group. The modified protein can then be used to study the invivo behavior of molecules in real time, capturing dynamic events, e.g.,in the assembly and disassembly multiprotein complexes or in theactivation or deactivation of signal transduction cascades.

Real-time protein folding, protein transport, and protein dynamics canalso be monitored in vivo using proteins that comprise unnatural aminoacids that can participate in NCL reactions or form thiazolidine bonds.A protein of interest comprising such an unnatural amino acid can beexpressed in live cells that are then incubated with, e.g., acell-permeable fluorescent probe (e.g., Cascade Blue, Alexa568,Alexa647, etc.), a dye, a pH probe, or a chromophore (e.g., phytochrome,bilirubin, phycobilin, etc.), that comprises a reactive thioester oraldehyde group. The probe can efficiently penetrate the cell membrane,and chemoselectively react with the 1,2 aminothiol group of theunnatural amino acid, producing a labeled protein of interest that canbe monitored via, e.g., fluorescence microscopy.

Likewise, proteins comprising any one or more unnatural amino acids ofthe invention can be reacted with thioester or aldehyde-derivatizedaffinity tags (e.g., biotin, FLAG, hexahistidine, etc.) to facilitate,e.g., protein purification or imunnohistochemical analyses, such aswestern blots, ELISAs, antibody staining, etc.

The ability to conjugate polypeptides to nucleic acids, including RNAand DNA, is important in a number of life science applications. Forexample, a polypeptide-nucleic acid conjugate made using reacting, e.g.,a fluorescent or a radioactive polypeptide comprising an amino aciddepicted in FIG. 1 with, e.g., a thioester- or aldehyde-derivatized RNA,DNA, or oligonucleotide, can be useful in the labeling of nucleic acidprobes for use in, e.g., Southern blots, northern blots, quantitativePCR, and other analyses.

Unnatural amino acids that comprise 1,2 aminothiol groups can be usefulin studies in which NMR spectroscopy is used to determine a protein'sstructure. The incorporation of such an amino acid into a protein ofinterest allows the site-specific ligation of, e.g., a thioester- oraldehyde-bearing moiety comprising an NMR-sensitive nucleus, into, e.g.,a discrete domain of a very large protein. This strategy can lead tosimplified NMR spectra and can reduce the loss of spectral resolutionthat occurs as a result of increased line widths and increased numbersof signals with similar chemical shifts. Other aldehyde- orthioester-derivatized biophysical probes, e.g., spin labels (such asnitroxide), NMR probes, IR probes, X-ray probes, and the like, can besimilarly conjugated to a protein of interest comprising, e.g., anunnatural amino acid depicted in FIG. 1, in preparation to investigatethe protein's structural dynamics.

One of the most critical issues in the production of, e.g., a proteinmicroarray is the development of protein immobilization methods.Optimally, a method would allow the efficient immobilization of proteinsonto a solid support, e.g., a glass surface, while maintaining theproteins' native structures and biological functions. A target proteininto which an unnatural amino acid comprising a 1,2 aminothiol group hasbeen incorporated can be efficiently immobilized, e.g., via covalentbond formation, under aqueous conditions in a chemoselective manner,e.g., via NCL on a thioester-functionalized surface or via thiazolidineformation on an aldehyde-functionalized surface, e.g., a glass slide.The amino acid position at which the unnatural amino acid, e.g., anunnatural amino acid depicted in FIG. 1, is incorporated can becarefully chosen to insure that the activity and/or conformation of thetarget protein is not compromised by immobilization. A similar strategycan be used to immobilize a protein of interest onto a resin, a solidsupport.

Expressed protein ligation (EPL) is a protein semisynthesis method inwhich two or more, e.g., synthetic peptide segments or recombinantlyexpressed proteins, are covalently joined in a chemoselective mannervia, e.g., NCL. This protein engineering technique can be used in tosynthesize protein molecules, e.g., branched protein molecules, thatpossess altered structural and functional properties that would beuseful tools in drug screening and in understanding complex processessuch as, e.g., signal transduction and protein-ligand interactions. EPLcan also be used to ligate, e.g., two or more folded protein domains toproduce a desired chimera, which, if expressed translationally, wouldotherwise be unable to fold. However, EPL via NCL is still limited bythe requirement that one of the peptide segments comprise an N-terminalCys residue that can react with a thioester or aldehyde group in asecond peptide segment. This constraint can be overcome by thetranslational incorporation of an unnatural amino acid that comprises a1,2 aminothiol group, e.g., any of the unnatural amino acids describedherein, into the peptide segment(s) that are to be ligated via NCL orvia thiazolidine formation. Such an UAA can be incorporated into apolypeptide segment, e.g., a synthetic polypeptide segment, asemi-synthetic polypeptide segment, or a polypeptide segment producedvia translation, at an amino acid position that permits the NCL reactionor thiazolidine bond formation without disturbing the segment'sbiological activity.

Glycosylation is one of the most common post-translational modificationsof proteins in eukaryotes and affects a wide range of protein functionsfrom folding and secretion to biomolecular recognition and serum halflife. See, e.g., R. A. Dwek, (1996) “Glycobiology: toward understandingthe function of sugars.” Chem Rev 96: 683-720. While there have beensignificant advances in our understanding of the effects ofglycosylation, the specific roles of oligosaccharide chains and therelationships between their structures and functions are just beginningto be understood. See, e.g., C. R. Bertozzi, et al. (2001) “ChemicalGlycobiology.” Science 291: 2357-2364. The primary challenge is thatglycoproteins are typically produced as a mixture of glycoforms, makingit difficult to isolate homogenous samples of, e.g., a glycoprotein ofinterest comprising a particular defined oligosaccharide structure, fromnatural sources. The translational incorporation of an unnatural aminoacid that comprises a 1,2 aminothiol group, e.g., any of the unnaturalamino acids described herein, into a protein of interest, followed bythe attachment of an, e.g., carbohydrate or carbohydrate derivative ofinterest, to the desired protein at a defined amino acid position canpermit the large-scale production of proteins that comprise a definedpost-translational modification. This strategy can also be used tointroduce a variety of additional modifications, e.g., a nucleotideanalog, a metal chelator, a fatty acid, a terpene, a hapten, a toxin, alipid, a PEG (e.g., a branched PEG, a linear PEG, PEGs of differentmolecular weights), or a synthetic drug (for a description of syntheticdrugs, see, e.g., Remington: The Science and Practice of Pharmacy.University of the Sciences in Philadelphia, eds. (Lippincott Williams &Wilkins, 2006)), to a protein of interest in order to produce ahomogenous sample.

A protein comprising an unnatural amino acid of the invention, e.g., anunnatural amino acid depicted in FIG. 1, to which a saccharide moietyhas been ligated, e.g., via NCL, can be further glycosylated. Subsequentglycosylation steps can be carried out enzymatically, e.g., in vitro orin vivo, using, for example, a glycosyltransferase, glycosidase, orother enzyme known to those of skill in the art.

For enzymatic saccharide syntheses that involve glycosyltransferasereactions, the cells of the invention optionally contain at least oneheterologous gene that encodes a glycosyltransferase. Manyglycosyltransferases are known, as are their polynucleotide sequences.See, e.g., “The WWW Guide To Cloned Glycosyltransferases,” (available onthe World Wide Web at www.vei.co.uk/TGN/gt_guide.htm).Glycosyltransferase amino acid sequences and nucleotide sequencesencoding glycosyltransferases from which the amino acid sequences can bededuced are also found in various publicly available databases,including GenBank, Swiss-Prot, EMBL, and others.

Glycosyltransferases that can be employed, e.g., in the cells of theinvention, e.g., to further modify saccharides ligated to proteins ofinterest at the site of an unnatural amino acid that comprises a 1,2aminothiol group, include, but are not limited to,galactosyltransferases, fucosyltransferases, glucosyltransferases,N-acetylgalactosaminyltransferases, N-acetylglucosaminyltransferases,glucuronyltransferases, sialyltransferases, mannosyltransferases,glucuronic acid transferases, galacturonic acid transferases,oligosaccharyltransferases, and the like. Suitable glycosyltransferasesinclude those obtained from eukaryotes, as well as from prokaryotes.

The glycosylation reactions which further modify sugars that have beenligated, e.g., via NCL, to a desired protein at the site of an unnaturalamino acid comprising a 1,2 aminothiol group include, in addition to theappropriate glycosyltransferase and acceptor, an activated nucleotidesugar that acts as a sugar donor for the glycosyltransferase. Thereactions can also include other ingredients that facilitateglycosyltransferase activity. These ingredients can include a divalentcation (e.g., Mg⁺² or Mn⁺²), materials necessary for ATP regeneration,phosphate ions, and organic solvents. The concentrations or amounts ofthe various reactants used in the processes depend upon numerous factorsincluding reaction conditions such as temperature and pH value, and thechoice and amount of acceptor saccharides to be glycosylated. Thereaction medium may also comprise solubilizing detergents (e.g., Tritonor SDS) and organic solvents such as methanol or ethanol, if necessary.

Further details regarding systems and methods for the preparation ofglycoproteins using proteins that comprise unnatural amino acids areelaborated in U.S. patent application Ser. No. 11/255,601, titled “INVIVO SITE-SPECIFIC INCORPORATION OF N-ACETYL-GALACTOSAMINE AMINO ACIDSIN EUBACTERIA” and U.S. Pat. No. 6,927,042, titled “GLYCOPROTEINSYNTHESIS”, the contents of each of which are incorporated by referencein their entirety.

Proteins and Polypeptides of Interest

Essentially any protein (or portion thereof) that includes an unnaturalamino acid that comprises a 1,2 aminothiol group, e.g., an unnaturalamino acid shown in FIG. 1, can be produced using the compositions andmethods herein. No attempt is made to identify the hundreds of thousandsof known proteins, any of which can be modified to include one or moreunnatural amino acids that comprise a 1,2 aminothiol group, e.g., bytailoring any available mutation methods to include one or moreappropriate selector codons in a relevant translation system. Commonsequence repositories for known proteins include GenBank EMBL, DDBJ andthe NCBI. Other repositories can easily be identified by searching theinternet.

Typically, the proteins are, e.g., at least 60%, at least 70%, at least75%, at least 80%, at least 90%, at least 95%, or at least 99% or moreidentical to any available protein (e.g., a therapeutic protein, adiagnostic protein, an industrial enzyme, or portion thereof, and thelike), and they comprise one or more unnatural amino acids of theinvention. In some embodiments, polypeptides that can comprise at leastone unnatural amino acid that comprises a 1,2 aminothiol group include aZ domain of Staphylococcal protein A, an SH3 domain, and c-Crk.

A classical “Z domain” is a 58-residue three-helix bundle derived froman IgG Fc-binding domain of Staphylococcus aureus Protein A.Randomization and substitution of the surface-exposed amino acidscomprising this scaffold protein, e.g., with an unnatural amino acidcomprising a 1,2 aminothiol group, can be useful in generating affinityligands capable of binding desired target proteins. Affinity ligandsproduced in this manner can become widespread tools in a variety ofbiotechnological and biomedical applications, e.g., affinitypurification (Lamla, et al. (2004) “The Nano-tag, a streptavidin-bindingpeptide for the purification and detection of recombinant proteins.”Protein Expr Purif 33: 39-47), protein microarray technology (Renberg etal. (2005) “Affibody protein capture microarrays: Synthesis andevaluation of random and directed immobilization of affibody molecules.”Anal Biochem 341: 334-343), bioimaging (Wikman et al. (2004) “Selectionand characterization of HER2/neu-binding affibody ligands.” Protein EngDes Sel 17: 455-462), enzyme inhibition (Amstutz et al. (2005)“Intracellular kinase inhibitors selected from combinatorial librariesof designed ankyrin repeat proteins.” J Biol Chem 280: 24715-24722), andpotential targeted drug delivery (Heyd et al. (2003) “In vitro evolutionof the binding specificity of neocarzinostatin, an enediyne-bindingchromoprotein.” Biochemistry 42: 5674-5683; Nicaise et al. (2004)“Affinity transfer by CDR grafting on a nonimmunoglobulin scaffold.”Protein Sci 13: 1882-1891).

Src homology 3 (SH3) domains are non-catalytic protein modules that havebeen identified in hundreds of signaling proteins in diverse eukaryoticspecies ranging from yeast to human (Mayer (2001) “SH3 domains:complexity in moderation.” Journal Cell Sci 114: 1253-1263; Tong, et al.(2002) “A combined experimental and computational strategy to defineprotein interaction networks for peptide recognition modules.” Science295: 321-324). Members of the SH3 family, which typically comprise 50-70amino acid residues, recognize specific proline-rich sequence motifs intarget proteins and act as molecular adhesives that mediateprotein-protein interactions in a variety of biological processes. SH3domains play critical roles in the formation of multiprotein complexes,the formation molecular networks responsible for signal transduction,and in the regulation of enzyme activity and cytoskeletal organization(reviewed in Morton, et al. (1994) “SH3 domains: Molecular Velcro.” CurrBiol 4: 615-617; Mayer (2001) “SH3 domains: complexity in moderation.”Journal Cell Sci 114: 1253-1263). The versatile nature of SH3 domains intarget protein recognition suggests that these modular domains areadaptable and, like Z domain, can also find use as affinity reagentsthat can be tailored to generate ligands that possess prescribedspecificities. Substitution of amino acids in an SH3 domain with, e.g.,an unnatural amino acid comprising a 1,2 aminothiol group, can be usefulin tailoring such affinity ligands.

c-Crk is a member of a family of signaling proteins whose modular domainarchitecture consists largely of an Src homology 2 domain (SH2) followedby two SH3 domains. Crk family proteins are widely expressed and mediatethe formation of signal transduction protein complexes in response to avariety of extracellular stimuli, including growth and differentiationfactors (reviewed in Feller (2001) “Crk family adaptors-signalingcomplex formation and biological roles.” Oncogene 20: 6348-71). Theupstream signaling partners which selectively bind to c-Crk includevarious receptors (Sorokin, et al. (1998) “Crk protein binds to PDGFreceptor and insulin receptor substrate-1 with different modulatingeffects on PDGF- and insulin-dependent signaling pathways.” Oncogene 16:2425-2434; Furge, et al. (2000) “Met receptor tyrosine kinase: enhancedsignaling through adapter proteins.” Oncogene 19: 5582-9) and largemultisite docking proteins (Feller, et al. (2006) “Potential diseasetargets for drugs that disrupt protein-protein interactions of Grb2 andCrk family adaptors.” Curr Pharm Des 12: 529-548; Huang, et al. (2006)“The docking protein Cas links tyrosine phosphorylation signaling toelongation of cerebellar granule cell axons.” Mol Bio Cell 17:3187-3196), while several protein kinases and guanine nucleotide releaseproteins (GNRPs) have been suggested to function downstream of c-Crk toeffect, e.g., cell motility, adhesion, and cell growth regulation(Tanaka, et al. (1994) “C3G, a Guanine Nucleotide-Releasing ProteinExpressed Ubiquitously, Binds to the Src Homology 3 Domains of CRK andGRB2/ASH Proteins.” Proc Natl Acad Sci USA 91: 3443-3447; Polte, et al.(1997) “Complexes of Focal Adhesion Kinase (FAK) and Crk-associatedSubstrate (p130^(Cas)) Are Elevated in Cytoskeleton-associated Fractionsfollowing Adhesion and Src Transformation.” Journal Biol Chem 272:5501-5509, reviewed in Feller (2001) “Crk family adaptors-signalingcomplex formation and biological roles.” Oncogene 20: 6348-71). The SH2domains of c-Crk can interact with upstream signaling molecules (Anafi,et al. (1997) “SH2/SH3 Adaptor Proteins Can Link Tyrosine Kinases to aSte20-related Protein Kinase, HPK1.” J Biol Chem 44: 27804-27811),whereas the SH3 domains of c-Crk are critical for the coupling of thisadaptor protein to effector molecules and for the targeting of signalingcomplexes to discrete sites within the cell (Bar-Sagi, et al. (1993)“SH3 domains direct cellular localization of signaling molecules.” Cell74: 83-91). Substitution of amino acids in the SH2 and SH3 domains ofCrk with, e.g., a UAA comprising a 1,2 aminothiol group, can produce Crkvariants with altered target protein specificities and/or Crk variantsthat respond to altered upstream stimuli.

Other examples of therapeutic, diagnostic, and other polypeptides thatcan comprise at least one unnatural amino acid that comprises a 1,2aminothiol group, e.g., a2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acidresidue, include, but are not limited to, those in InternationalPublications WO 2004/094593, filed Apr. 16, 2004, entitled “Expandingthe Eukaryotic Genetic Code;” and, WO 2002/085923, entitled “IN VIVOINCORPORATION OF UNNATURAL AMINO ACIDS.” Such proteins and polypeptidedomains include aldosterone receptor, alpha-1 antitrypsin, angiostatin,antibody fragments, antihemolytic factor, apolipoprotein, apoprotein,atrial natriuretic factor, an atrial peptide, calcitonin, CC chemokine,CD40, CD40 ligand, CD44, collagen, colony stimulating factor (CSF),complement factor 5a, complement inhibitor, complement receptor 1,corticosterone, C—X—C chemokine, a cytokine, D31065, DHFR, ENA-78,estrogen receptor, epidermal growth factor (EGF), an epithelialneutrophil activating peptide, epithelial Neutrophil ActivatingPeptide-78, erythropoietin (EPO), exfoliating toxin, Factor VII, FactorVIII, Factor IX, Factor X, fibrinogen, fibroblast growth factor (FGF),fibronectin, Fos, GCP-2, G-CSF, glucocerebrosidase, GM-CSF,gonadotropin, Gro-a, Gro-b, Gro-c, GROα, GROβ, GROγ, a growth factor, agrowth factor receptor, HCC1, a hedgehog protein, hemoglobin, hepatocytegrowth factor (HGF), human serum albumin (HAS), human growth hormone(HGH), hyalurin, I309, ICAM-1, ICAM-1 receptor, IFN-α, IFN-β, IFN-γ,IGF-I, IGF-II, IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9,IL-10, IL-11, IL-12, inflammatory molecules, insulin, insulin-likegrowth factor (IGF), an interferon, an interleukin, IP-10, Jun,keratinocyte growth factor (KGF), Lactoferrin, LDL receptor, leukemiainhibitory factor, LFA-1, LFA-1 receptor, luciferase, MCP-1, Met, MGSA,MIG, MIP1-α, MIP1-β, MIP1-δ, monocyte chemoattractant protein-1,monocyte chemoattractant protein-2, monocyte chemoattractant protein-3,Mos, Myb, Myc, NAP-2, NAP-4, Neurturin, neutrophil inhibitory factor(NIF), an oncogene product, oncostatin M, osteogenic protein-1, p53,parathyroid hormone, PD-ECSF, PDGF, a peptide hormone, PF4, pleiotropin,progesterone receptor, Protein A, Protein G, pyrogenic exotoxins A, B,or C, R83915, R91733, Raf, RANTES, Ras, Rel, relaxin, renin, SCF/c-kit,SDF-1, SEA, SEB, SEC1, SEC2, SEC3, SED, SEE, a signal ransductionmolecule, soluble complement receptor I, soluble I-CAM 1, solubleinterleukin receptor, soluble TNF receptor, somatomedin, somatostatin, asomatotropin, a Staphylococcal enterotoxin, a steroid hormone receptor,streptokinase, a superantigen, superoxide dismutase (SOD), T39765,T58847, T64262, Tat, a testosterone receptor, TGF-α, TGF-β, thymosinalpha 1, tissue plasminogen activator, Toxic Shock Syndrome toxin,transcriptional activators, transcriptional repressors, a tumor growthfactor (TGF), tumor necrosis factor (TNF), TNF alpha, TNF beta, a tumornecrosis factor receptor (TNFR), urokinase, vascular endothelial growthfactor (VEGEF), VCAM-1 protein, and VLA-4 protein.

EXAMPLES

The following examples are offered to illustrate, but not to limit theclaimed invention. It is understood that the examples and embodimentsdescribed herein are for illustrative purposes only and that variousmodifications or changes in light thereof will be suggested to personsskilled in the art and are to be included within the spirit and purviewof this application and scope of the appended claims.

SYNTHESIS OF2-AMINO-3-(4-(2-AMINO-3-MERCAPTOPROPANAMIDO)PHENYL)PROPANOIC ACID

The materials used in the synthesis of2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid wereobtained from the following sources: Water was taken from a Milli-Qultra pure water purification system (Millipore). Peptidesynthesis-grade dimethylformamide (DMF) was purchased from Alfa Aesar.N-(tert butoxycorbonyl)-4-aminophenylalanine (Boc-p-amino-Phe-OH) waspurchased from Bachem. N-(tertbutoxycarbonyl)-S-(triphenylmethyl)cysteine (Boc-Cys(Trt)-OH) and1-hydroxybenzotriazole hydrate (HOBt) were purchased from Novabiochem.HPLC-grade acetonitrile was purchased from Fisher Scientific. Deuteratedsolvents, which were used to solubilize synthesized compounds inpreparation for NMR characterization, were purchased from CambridgeIsotope Laboratories Inc. Other commercial reagents were purchased fromAcros Organics and were used without further purification.

High-resolution mass spectra were measured on an Agilent 6210 Time ofFlight Mass Spectrometer. Preparative HPLC was run on a Hitachi (D-7000HPLC system) instrument using a preparative column (Grace Vydac “Protein& Peptide C18”, 250×22 mm, 10-15 μm particle size, flow rate 8 mL/min).Detection of the signal was achieved with either photodiode array or UVdetector at a wavelength of λ=260 nm. Eluents A (0.1% trifluoroaceticacid (TFA) in water) and B (0.1% TFA in acetonitrile) were used in alinear gradient.

Two reaction schemes were followed to synthesize2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Inthe first reaction scheme, shown in FIG. 2, Compound 1 Boc-Cys(Trt)-OH(1.6 g, 3.53 mmol), 1-(3-dimethylaminopropyl)-3-ethylcarbodiimidehydrochloride (EDC*HCl) (600 mg, 3.9 mmol) and HOBt (500 mg, 3.9 mmol)were dissolved in anhydrous DMF (35 mL). Trt=trityl.N,N-diisopropylethylamine (DIEA) (668 μL, 3.9 mmol) was added and thereaction mixture was stirred under argon for 5 min, followed by theaddition of Compound 2 Boc-p-amino-Phe-OH (900 mg, 3.21 mmol). Afterstirring for 12 h, the solvent was removed from the solution via vacuum,and the residue was purified by column chromatography (eluent: first 3:1v/v hexane/ethyl acetate, then 2:1 v/v hexane/ethyl acetate) to affordCompound 32-(tert-butoxycarbonylamino)-3-(4-(2-(tert-butoxycarbonylamino)-3-(tritylthio)propanamido)phenyl)propanoicacid as a white solid which was directly used in the next synthesis step(R_(F) (eluent: 2:1 v/v hexane/ethyl acetate)=0.13).

In the second reaction scheme, shown in FIG. 3, Compound 3Boc-Cys(Trt)-OH (1.6 g, 3.53 mmol) was dissolved in a mixture of TFA(trifluoroacetic acid), triisopropylsilane (TIS), thioanisole (TA) andwater (50 mL, 85:5:5:5 by volume) and stirred for 20 min. The solventwas removed from the solution via vacuum, and the residue was purifiedby preparative reversed phase HPLC to afford, after lyophylisation,Compound 5 as a white solid. NMR spectra of Compound 5 are as follows:¹H NMR (500 MHz, MeOH-d₄): δ 3.05 (m, 2H, CH₂), 3.15 (m, 2H, CH₂), 4.17(m, 1H, HSCH₂C(NH₂)H), 4.22 (m, 1H, HCNH₂COOH), 7.29 (d, J=8.5 Hz, 2H,2×aromatic H), 7.62 (d, J=8.5 Hz, 2H, 2 aromatic H); ¹³C NMR (125 MHz,MeOH-d₄): δ 26.4, 36.9, 55.2, 56.8, 121.9, 131.2, 132.2, 138.6, 166.8,171.3. These results indicate that Compound 5 possesses the chemicalstructure of the unnatural amino acid2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Theyield of Compound 5 is 23% over the two synthesis steps.

The results of high-resolution mass spectrometry (electrospray sampleionization-time of flight), or HRMS (ESI-TOF), analysis of Compound 5are as follows: m/z calculated for C₁₂H₁₇N₃O₃S [M+H]⁺: 284.1063; found:284.1069, indicating that the synthesized compound exhibited theexpected molecular mass.

SELECTION FOR A M. JANNASCHII-DERIVED ORTHOGONAL AMINOACYL-tRNASYNTHETASE (O-RS) SPECIFIC FOR THE UNNATURAL AMINO ACID2-AMINO-3-(4-(2-AMINO-3-MERCAPTOPROPANAMIDO)PHENYL)PROPANOIC ACID

Based on an analysis of the X-ray crystal structure of the MethanococcusJannaschii tyrosine synthetase with its cognate amino acid or aminoacyladenylate (Kobayashi, et al. (2003) “Structural basis for orthogonaltRNA specificities of tyrosyl-tRNA synthetases for genetic codeexpansion.” Nature Struct Biol 10: 425-432; Brick, et al. (1989)“Structure of tyrosyl-tRNA synthetase refined at 2.3 Å resolutioninteraction of the enzyme with the tyrosyl adenylate intermediate.” JMol Biol 208: 83-98), a library of synthetase variants was generated byrandomizing 10 amino acid residues: Tyr32 to codon RGK (Gly, Arg orSer), Leu65 to codon NNT (Ala, Arg, Asn, Asp, Cys, Gly, His, Ile, Leu,Phe, Pro, Ser, Thr, Tyr or Val), Ala67 to codon GST (Ala or Gly), His70to codon NNT, Phe108 to codon NNK (all amino acids), Gln109 to codonNNK, Tyr114 to codon VNT (Ala, Arg, Asn, Asp, Gly, His, Ile, Leu, Pro,Ser, Thr or Val), Asp158 to codon RST (Ala, Gly, Ser or Thr), Leu162 tocodon VVK (Ala, Arg, Asn, Asp, Gln, Glu, Gly, His, Lys, Pro, Ser or Thr)and Ile159 to codon NNT. This library was constructed by overlapextension polymerase chain reaction (PCR) using synthetic degenerateoligonucleotide primers to introduce the mutations described above.These methods are described in more detail in, e.g., Xie, et al. (2005)“An expanding genetic code.” Methods 36: 227-238. The PCR-derivedsynthetase variants were cloned into the pBK plasmid and transformedinto E. coli strain DH10B. The theoretical diversity of this library is2.9×10¹⁰, based on codon diversity, and 4.7×10⁹, based on amino aciddiversity. In practice, a library diversity of 1×10¹⁰ was achieved.

Selections for an orthogonal synthetase (O-RS) capable of charging itscognate orthogonal tRNA (O-tRNA) with2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid wereperformed as previously described in, e.g., Xie, et al. (2005) “Anexpanding genetic code.” Methods 36: 227-238. Colonies were selected onLB agar plates that contained 1 mM2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Theplates were prepared with a 100 mM stock solution of the unnatural aminoacid that was filtered through a 0.2 μM syringe and stored at −20° C.(2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid is awhite powder that dissolves well in water.) After 3 rounds of positiveselection that were alternated with 2 rounds of negative selection, 4×96colonies were picked and spotted onto 3 sets LB plates containing 20,40, or 50 μg/ml chloramphenicol; and onto 3 sets of LB plates containing1 mM 2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acidand 50, 66 or 110 μg/ml chloramphenicol. These spottings were done toestimate growth. Only one clone exhibited the desired phenotype, e.g.,chloramphenicol resistance and growth only on media supplemented with2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid (plate2, position H8). The clone grew well on an LB agar plate containing 1 mMof the unnatural amino acid and 66 μg/ml chloramphenicol and exhibitedfair growth on LB agar with 1 mM of the unnatural amino acid and 110μg/ml chloramphenicol. The clone did not grow on plates in the absenceof 1 mM 2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoicacid. The plasmid encoding the putative desired O-RS was isolated andsequenced with the following primers: F53, e.g., SEQ ID NO: 4,CCTGATATGAATAAATTGCAGTTTC; F63, e.g., SEQ ID NO: 5,GTTGTTTACGCTTTGAGGAAT; and F67, e.g., SEQ ID NO: 6, GCGGAGCCTATGGAAAA.The MjTyrRS mutant that was obtained has the mutations shown in Table 2below.

TABLE 2 WT RS WT RS Mutant Mutant RS Amino Acid Position codonamino acid RS codon amino acid 32 TAC Tyr GGG Gly 65 TTG Leu GAT Asp 67GCT Ala GCT Ala (silent mutation) 70 CAC His ATT Ile76 (this position was not randomized AAA Lys AAG Lysin the library; it is posited to be a (silent mutation)spontaneous mutation) 84 (this position was not randomized AAA Lys GAAGlu in the library; it is posited to be a spontaneous mutation) 108 TTCPhe ACG Thr 109 CAG Gln TAT Tyr 114 TAT Tyr CGT Arg 158 GAT Asp GGT Gly159 (this position was not randomized ATT Ile ATT Ilein the library; it is posited to be a (silent) spontaneous mutation) 162TTA Leu GAG Glu 250 (this position was not randomized GAA Glu GGA Glyin the library; it is posited to be a spontaneous mutation)

VERIFYING THE SPECIFICITY AND FIDELITY OF THE INCORPORATION OF2-AMINO-3-(4-(2-AMINO-3-MERCAPTOPROPANAMIDO)PHENYL)PROPANOIC ACID INTOZ-DOMAIN PROTEIN IN E. COLI BY THE MUTANT M. JANNASCHII2-AMINO-3-(4-(2-AMINO-3-MERCAPTOPROPANAMIDO)PHENYL)PROPANOIC ACID TRNASYNTHETASE (RS)

Experiments to determine the efficiency and fidelity of incorporation of2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid intoproteins were performed as previously described (Xie, et al. (2004) “Thesite-specific incorporation of p-iodophenylalanine for structuredetermination.” Nature Biotech 22: 1297-1301). Plasmid pBK, whichencodes the mutant M. jannaschii2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid tRNAsynthetase (RS), and plasmid pLeiZ, which encodes a C-terminal 6 Histagged mutant Z-domain protein with an amber codon at amino acidposition 7 and the mutRNA_(CUA) ^(Tyr), were cotransformed into bothcompetent E. coli DH10B cells and competent E. coli GeneHog (Invitrogen)cells. The transformed cells were recovered in SOC for 1 hour prior toplating on LB again plates containing 50 μg/mL kanamycin (to select forpBK transformants) and 34 μg/ml chloramphenicol (to select for pLeiZtransformants). The plates were then incubated at 37° C. for 14-18hours.

A single kanamycin^(R) chloramphenicol^(R) colony was picked from theplates and used to inoculate 6 mL 2YT medium containing 50 μg/mLkanamycin and 34 μg/ml chloramphenicol. When the culture was grown tonear saturation, 500 μL of this culture were used to inoculate 15 mLs ofGMML medium containing 50 μg/mL kanamycin and 34 μg/ml chloramphenicol.GMML medium is a glycerol minimal medium comprising leucine (1×M9 salts(Sigma), 1 mM MgSO₄, 0.1 mM CaCl₂, 0.5 g/L NaCl, 0.3 mM L-leucine, 1%(vol/vol) glycerol). When the GMML culture was grown to near saturation,8 mLs of this culture were used to inoculate 200 mLs of GMML mediumcontaining 50 μg/mL kanamycin, 34 μg/ml chloramphenicol, and 1.0 mM2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Asecond culture of GMML medium containing 50 μg/mL kanamycin, 34 μg/mlchloramphenicol and no UAA was also inoculated and grown in parallel.When these 200 mL cultures reached an OD₆₀₀ of approximately 0.5-0.6,isopropyl-β-D-thiogalactopyranoside (IPTG) was added to a finalconcentration of 1 mM to induce expression of Z-domain protein. After 4hours, the cells were pelleted via centrifugation.

The cell pellets were resuspended in 6 mLs of Buffer B (100 mM NaH₂PO₄,10 mM Tris-HCl, 8M Urea, pH=8.0) and lysed via sonication (3×30 secondsonication cycles followed by 60 second incubations on ice). Thesonicated whole cell lysates were centrifuged at 10,000 g for 30 minutesat 25° C., and the recovered supernatants were mixed with 1 mL of 50%Ni-NTA agarose slurry. The supernatant/slurry mixtures were shakengently at room temperature for 60 minutes and then each loaded onto acolumn (available from Qiagen). The column flow-through was collected,and the columns were then washed twice with 4 mLs of Buffer C (100 mMNaH₂PO₄, 10 mM Tris-HCl, 8M urea, pH=6.8, buffer prepared right beforeuse), and the recombinant Z-domain protein was eluted from each columnwith 5×0.5 mL Buffer E (100 mM NaH₂PO₄, 10 mM Tris-HCl, 8M urea, pH=4.5,buffer prepared right before use). All washes and elutions were alsocollected.

The eluted fractions, column washes, and flow-through were assayed bySDS PAGE (10-20% polyacrylamide, or 15% Tris Glycine PAGE), and the gelwas then stained with GelCode (Pierce) or silver stain to visualize theresults of the Z-domain protein expression and purification experimentsdescribed above. A GelCode-stained SDS PAGE on which purified proteinsamples derived from cultures grown in the absence (lane 2) or presence(lane 3) 2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoicacid is shown in FIG. 4. Molecular weight markers (Benchmark™, availablefrom Invitrogen) were run in lane 1. FIG. 4 shows that full-lengthZ-domain protein (indicated with an arrow) was expressed only in cellscultured in media to which2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid wasadded. (FIG. 4, lane 3). Results of ESI-MS analysis show the other majorband in FIG. 4, lanes 2 and 3 is most likely an E. colipetidyl-prolyl-cis-trans-isomerase, an E. coli protein comprising a longsequence of histidine residues. These results indicate that the2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid RSrecognizes this UAA with high specificity and does not recognizeendogenous amino acids in E. coli to any significant degree.

Electrospray ionization mass spectrometry (ESI-MS) analysis of both thepurified UAA and wild type Z-domain proteins was performed at theGenomics Institute of the Novartis Foundation (GNF) to confirm theincorporation of2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid intothe UAA Z-domain mutant. The difference in mass between the wt Z-domainprotein, which comprises a tyrosine residue (molecular mass=181.07 Da)at amino acid position 7, and UAA Z-domain, which comprises a2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid(molecular mass=283.10 Da) at amino acid position 7, is expected to be102.03 Da. As shown in Table 3 below, the observed average masses ofboth the WT and UAA proteins are in close agreement with theircalculated masses. N-terminal methionine cleavage products offull-length mutant Z domain protein and wt Z-domain protein, and theiracetylation products were also observed (see Table 3). The presence ofthe UAA at amino acid position 7 is posited to impair cleavage of thefirst methionine, which is mostly “off” in the WT protein and mostly“on” in the mutant variant comprising2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. Takentogether, SDS PAGE and ESI-MA results indicate that2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid isincorporated into Z domain with high efficiency and fidelity.

TABLE 3 N- wt Z-domain UAA Z-domain First terminus Expected ObservedExpected Observed Methionine Acetylated Mass Mass Mass Mass on yes7971.18 — 8073.21 8074 on no 7928.18 — 8030.21 8031 (major peak) off yes7839.99 7840 7942.02 — off no 7796.99 7797 7899.02 7900 (major peak)

CONJUGATING A PROTEIN COMPRISING2-AMINO-3-(4-(2-AMINO-3-MERCAPTOPROPANAMIDO)PHENYL)PROPANOIC ACID TO AFLUORESCENT DYE THIOESTER VIA NATIVE CHEMICAL LIGATION

The fluorescent dye fluorescein-MES-thioester (molecular weight=501.03Da) was conjugated via native chemical ligation (NCL) to purifiedZ-domain protein comprising the unnatural amino acid (UAA)2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid atamino acid position 7. In an NCL reaction, the aminothiol moiety thatreacts with a thioester moiety to yield a peptide bond at the reactionsite is typically provided by a polypeptide comprising an N-terminalcysteine residue. As such, the modifications that can be made to apolypeptide via NCL are limited to the polypeptide's N-terminus.Incorporation of the unnatural amino acid2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid into apolypeptide permits modifications made via NCL reactions to be made atany amino acid position in a polypeptide.

The purified Z-domain comprising2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid atamino acid position 7 (UAA Z-domain) was prepared as described above andwas resuspended in a buffer comprising 20 mM Na-phosphate, 100 mM NaCl,1 mM DTT. The pH of this buffer was adjusted to 7.4 at room temperature.The resuspended Z-domain protein was dialyzed at 4° C. (Slidealyzer,Pierce, 0.1-0.5 ml capacity, 3500 MWCO) against 4×1 liter of a buffercomprising 20 mM Na-phosphate, 100 mM NaCl, pH=7.4. The pH of thisbuffer was adjusted at room temperature.

The fluorescein-MES-thioester (FIG. 5) was prepared from isomer-purecarboxyfluorescein (obtained from EMD biochemicals) and MES-sodium salt.The final product was a MES (methane-ethane-sulfonic acid) thioester,which was purified via HPLC and characterized via LCMS.

A two-fold molar excess of fluorescein-MES-thioester (in water) wasadded to the dialyzed UAA Z-domain. The ligation mixture was shaken for16 hours at 4° C. The crude NCL reaction mixture was run on an SDS gelin non-reducing buffer (MES buffer, available from Invitrogen) andvisualized with SimplyBlue™ SafeStain (a Coomassie-like staining methodavailable from Invitrogen), as shown in FIG. 6. Molecular weight markers(Benchmark™, available from Invitrogen) were run in lane 1, unreactedUAA Z-domain protein was run in lane 2, and a sample of the ligationmixture was run in lane 3. The higher molecular weight species in lanes2 and 3 are UAA-Z domain dimers. The gel in FIG. 6 was also visualizedunder UV light (FIG. 7). The band in FIG. 7, lane 3 (indicated by *)that corresponds to monomeric UAA-comprising Z domain protein in FIG. 6,lane 3 (also indicated by *) fluoresced under UV light. This resultdemonstrates that the fluorescent dye fluorescein-MES-thioester wasconjugated via NCL to purified Z-domain protein comprising the unnaturalamino acid (UAA)2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid atamino acid position 7.

SITE SPECIFIC PEGYLATION OF Z-DOMAIN PROTEIN COMPRISING A2-AMINO-3-(4-(2-AMINO-3-MERCAPTOPROPANAMIDO)PHENYL)PROPANOIC ACIDRESIDUE AT AMINO ACID POSITION 7

Reaction of an aldehyde with the unnatural amino acid2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid shouldproduce a stable adduct according to the native chemical ligation (NCL)reaction scheme shown in FIG. 8. Polyethylene glycol (PEG) aldehydederivatives of two different molecular weights (PEG-2000 and PEG-5000,both purchased from Fluka) were obtained and used in NCL reactions withZ-domain protein comprising a2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acidresidue at amino acid position 7 (UAA Z-domain), as described below.Reactions were set up in ratios of 1:1, 1:3, 1:66, or 1:1320 UAAZ-domain: PEG-aldehyde 2000 or in ratios of 1:1, 1:3, 1:26.4 or 1:528UAA Z-domain: PEG-aldehyde 5000 in aqueous solutions comprising 25 mMHEPES buffer pH=7.4. Each reaction was set up in duplicate. One set ofreactions was performed in the presence of 12 mM DTT, and the other wasperformed in the absence of DTT. All reaction mixtures were shakenovernight (approximately 12-15 hours) at room temperature and run on anSDS gel shown in FIG. 10.

As shown in FIG. 10, molecular weight markers (Benchmark, available fromInvitrogen) were run in lane 2, UAA Z-domain was run in lane 6, UAAZ-domain that was reacted with PEG-aldehyde 2000 at a ratio of 1:1320 inthe presence of 12 mM DTT was run in lane 7, UAA Z-domain that wasreacted with PEG-aldehyde 5000 at a ratio of 1:528 in the presence of 12mM DTT PEG aldehyde-5000 was run in lane 8. These results indicate thatZ-domain comprising2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid can bePEGylated at the site of the UAA with PEG aldehyde derivatives.

While the foregoing invention has been described in some detail forpurposes of clarity and understanding, it will be clear to one skilledin the art from a reading of this disclosure that various changes inform and detail can be made without departing from the true scope of theinvention. For example, all the techniques and apparatus described abovecan be used in various combinations. All publications, patents, patentapplications, and/or other documents cited in this application areincorporated by reference in their entirety for all purposes to the sameextent as if each individual publication, patent, patent application,and/or other document were individually indicated to be incorporated byreference for all purposes.

1. A translation system, comprising: (a) an unnatural amino acid thatcomprises a 1,2 aminothiol group; (b) a first orthogonal aminoacyl-tRNAsynthetase (O-RS); and, (c) a first orthogonal tRNA (O-tRNA); whereinthe first O-RS preferentially aminoacylates the first O-tRNA with theunnatural amino acid.
 2. The translation system of claim 1, wherein theunnatural amino acid is2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid. 3.The translation system of claim 1, wherein the unnatural amino acid thatcomprises a 1,2 aminothiol group is

wherein X is NH₂ or SH; wherein Y is SH when X is NH₂, or NH₂ when X isSH; wherein R is an H, CH₃, or CH₂CH₃; and, wherein N is an integer from0 to
 5. 4. The translation system of claim 1, wherein the unnaturalamino acid that comprises a 1,2 aminothiol group is

wherein X is NH₂ or SH; wherein Y is SH when X is NH₂, or NH₂ when X isSH; wherein R is an H, CH₃, or CH₂CH₃; and, wherein N is an integer from1 to
 4. 5. The translation system of claim 1, wherein the unnaturalamino acid that comprises a 1,2 aminothiol group is

wherein A is O, NH, NCH₃, or S; wherein X is NH₂ or SH; wherein Y is SHwhen X is NH₂, or NH₂ when X is SH; wherein R is an H, CH₃, or CH₂CH₃;and, wherein N is an integer from 0 to
 6. 6. The translation system ofclaim 1, wherein the unnatural amino acid that comprises a 1,2aminothiol group is

wherein X is NH₂ or SH; wherein Y is SH when X is NH₂, or NH₂ when X isSH; wherein R is an H, CH₃, or CH₂CH₃; and, wherein N is an integer from1 to
 6. 7. The translation system of claim 1, wherein the unnaturalamino acid that comprises a 1,2 aminothiol group is

wherein X is NH₂ or SH; wherein Y is SH when X is NH₂, or NH₂ when X isSH; wherein R is an H, CH₃, or CH₂CH₃; and, wherein N is an integer from1 to
 8. 8. The translation system of claim 1, wherein the unnaturalamino acid that comprises a 1,2 aminothiol group is

wherein X is NH₂ or SH; wherein Y is SH when X is NH₂, or NH₂ when X isSH; wherein R is an H, CH₃, or CH₂CH₃; wherein N is an integer from 1 to6; wherein M is an integer from 1 to 6; and, wherein A is an O, NH,NCH₃, S,


9. The translation system of claim 1, wherein the first O-RSpreferentially aminoacylates the first O-tRNA with the unnatural aminoacid with an efficiency that is at least 50% of the efficiency observedfor a translation system comprising the first O-tRNA, the unnaturalamino acid, and an aminoacyl-tRNA synthetase comprising the amino acidsequence of SEQ ID NO:
 1. 10. The translation system of claim 1, whereinthe first O-RS comprises an amino acid sequence, selected from the groupconsisting of: an amino acid sequence set forth in SEQ ID NO: 1, and aconservative variant thereof, wherein the conservative variantcomprises: an amino acid selected from the group consisting of: a)glycine at an amino acid position corresponding to amino acid 32 of SEQID NO: 1; b) aspartic acid at an amino acid position corresponding toamino acid 65 of SEQ ID NO: 1; c) isoleucine at an amino acid positioncorresponding to amino acid 70 of SEQ ID NO: 1; d) glutamic acid at anamino acid position corresponding to amino acid 84 of SEQ ID NO: 1; e)threonine at an amino acid position corresponding to amino acid 108 ofSEQ ID NO: 1; f) tyrosine at an amino acid position corresponding toamino acid 109 of SEQ ID NO: 1; g) arginine at an amino acid positioncorresponding to amino acid 114 of SEQ ID NO: 1; h) glycine at aminoacid position corresponding to amino acid 158 of SEQ ID NO: 1; i)glutamic acid at an amino acid position corresponding to amino acid 162of SEQ ID NO: 1; and, j) glycine at an amino acid position correspondingto amino acid 250 of SEQ ID NO:
 1. 11. The translation system of claim1, wherein the first O-tRNA is an amber suppressor tRNA, an ochresuppressor tRNA, an opal suppressor tRNA, or a tRNA that recognizes afour base codon, a rare codon, or a non-coding codon.
 12. Thetranslation system of claim 1, wherein the first O-tRNA comprises or isencoded by a polynucleotide sequence set forth in SEQ ID NO:
 3. 13. Thetranslation system of claim 1, comprising a nucleic acid encoding apolypeptide of interest, the nucleic acid comprising at least oneselector codon, wherein the selector codon is recognized by the firstO-tRNA.
 14. The translation system of claim 13, wherein the nucleic acidencoding a polypeptide of interest encodes a polypeptide comprising a Zdomain, a polypeptide comprising an SH3 domain, or a polypeptidehomologous to c-Crk.
 15. The translation system of claim 13, furthercomprising a second O-RS and a second O-tRNA, wherein the second O-RSpreferentially aminoacylates the second O-tRNA with a second unnaturalamino acid that is different from the unnatural amino acid thatcomprises the 1,2 aminothiol group, and wherein the second O-tRNArecognizes a selector codon that is different from the selector codonrecognized by the first O-tRNA.
 16. The translation system of claim 1,wherein the translation system comprises a cell.
 17. The translationsystem of claim 16, wherein the cell is a mammalian cell, an insectcell, a bacterial cell, or an E. coli cell.
 18. A method for producing apolypeptide comprising at least one unnatural amino acid that comprisesa 1,2 aminothiol group at a selected position, the method comprising:(a) providing a translation system comprising: (i) an unnatural aminoacid that comprises a 1,2-aminothiol group; (ii) a first orthogonalaminoacyl-tRNA synthetase (O-RS); (iii) a first orthogonal tRNA(O-tRNA), wherein the first O-RS preferentially aminoacylates the firstO-tRNA with the unnatural amino acid that comprises a 1,2-aminothiolgroup; and, (iv) a nucleic acid of interest encoding the polypeptide ofinterest, wherein the nucleic acid comprises at least one selector codonthat is recognized by the first O-tRNA; and, (b) incorporating theunnatural amino acid that comprises a 1,2 aminothiol group at theselected position in the polypeptide during translation of thepolypeptide in response to the selector codon, thereby producing thepolypeptide comprising the unnatural amino acid that comprises a 1,2aminothiol group at the selected position. 19-24. (canceled)
 25. Acomposition comprising a polynucleotide encoding an orthogonal aminoacyltRNA synthetase (O-RS) that preferentially aminoacylates a cognateorthogonal tRNA (O-tRNA) with an unnatural amino acid that comprises a1,2 aminothiol group. 26-30. (canceled)
 31. A method of producing anorthogonal aminoacyl-tRNA synthetase that preferentially aminoacylatesan O-tRNA with a2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid, themethod comprising: a) mutating a wild-type aminoacyl-tRNA synthetase;and, b) selecting a resulting O-RS that preferentially aminoacylates thefirst O-tRNA with the2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)propanoic acid,thereby providing the first O-RS.
 32. A method of synthesizing2-amino-3-(4-(2-amino-3-mercaptopropanamido)phenyl)-propanoic acid, themethod comprising: a) dissolvingN-(tert-butoxycarbonyl)-S-(triphenylmethyl)cysteine,1-(3-dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride, and1-hydroxybenzotriazole hydrate in anhydrous dimethylformamide to producesolution 1, b) adding N,N-diisopropylethylamine to solution 1 to producesolution 2, c) adding N-(tert-butoxycarbonyl)-4-aminophenylalanine tosolution 2 to produce solution 3, d) drying solution 3 to produceresidue 1, e) purifying residue 1 to produce solid 1, f) dissolvingsolid 1 in a mixture comprising trifluoroacetic acid,triisopropylsilane, thioanisole and water to produce solution 4, g)drying solution 4 to produce residue 2, and; h) purifying residue 2,which comprises the2-amino-3-(4-(2-amino-3-mercaptopropan-amido)phenyl)-propanoic acid.33-37. (canceled)